The recent explosive compute growth, mainly fueled by the boost of artificial intelligence (AI) and deep neural networks (DNNs), is currently instigating the demand for a novel computing paradigm that can overcome the insurmountable barriers imposed by conventional electronic computing architectures. Photonic neural networks (PNNs) implemented on silicon integration platforms stand out as a promising candidate to endow neural network (NN) hardware, offering the potential for energy efficient and ultra-fast computations through the utilization of the unique primitives of photonics, i.e., energy efficiency, THz bandwidth, and low-latency. Thus far, several demonstrations have revealed the huge potential of PNNs in performing both linear and non-linear NN operations at unparalleled speed and energy consumption metrics. Transforming this potential into a tangible reality for deep learning (DL) applications requires, however, a deep understanding of the basic PNN principles, requirements, and challenges across all constituent architectural, technological, and training aspects. In this Tutorial, we, initially, review the principles of DNNs along with their fundamental building blocks, analyzing also the key mathematical operations needed for their computation in photonic hardware. Then, we investigate, through an intuitive mathematical analysis, the interdependence of bit precision and energy efficiency in analog photonic circuitry, discussing the opportunities and challenges of PNNs. Followingly, a performance overview of PNN architectures, weight technologies, and activation functions is presented, summarizing their impact in speed, scalability, and power consumption. Finally, we provide a holistic overview of the optics-informed NN training framework that incorporates the physical properties of photonic building blocks into the training process in order to improve the NN classification accuracy and effectively elevate neuromorphic photonic hardware into high-performance DL computational settings.
I. INTRODUCTION
During the past decade, the relentless expansion of artificial intelligence (AI) through deep neural networks (DNNs) has been driving the need for high-performance computing and time-of-flight data processing. Conventional digital computing units, which are based on the well-known Von-Neumann architecture1 and inherently rely on serialized data processing, have faced daunting challenges toward undertaking the execution of emerging DNN datasets. Von-Neumann architectures comprise a centralized processing unit (CPU), which is responsible for executing all operations (arithmetic, logic, and controlling) dictated by the program’s instructions, and a separate random-access memory (RAM) unit that stores all necessary data and instructions. The communication between CPU and memory is realized via a shared bus that is used to transfer all data between them, implying that they cannot be accessed simultaneously. This leads to the well-known Von-Neumann bottleneck,2 where the processor remains idle for a certain amount of time during memory data access. On top of that, the need for moving data between CPU and memory (via the bus) requires charging/discharging of metal wires, limiting in this way both the bandwidth and the energy efficiency due to Joule heating and capacitance,3 respectively.
There have been numerous demonstrations toward overcoming these effects, including, among others, caching, multi-threading, and new RAM architectures and technologies (e.g., ferroelectric RAMs4 and optical RAMs5–9), with the ultimate target being the energy efficient and high-speed CPU-memory data movement. None of these solutions seems, however, to be capable of coping with the computational and energy demands of DNNs, revealing a need for shifting toward specialized computing hardware architectures. In this endeavor, highly parallelized accelerators have been developed, including graphic processing units (GPUs), application specific integrated circuits (ASICs), and field programmable gate arrays (FPGAs), with GPUs and ASICs being, until now, the dominant hardware computing engines for DNN implementations. Specifically, GPUs leverage their hundreds of cores toward accelerating the matrix multiplication operations of DNNs, which are the most time- and power-consuming computations.10 Moreover, they have dedicated non-uniform memory access architectures (e.g., video RAMs) that are (i) programmable, meaning that the stored data can be selectively accessed or deleted, (ii) faster than CPU counterparts, and (iii) located very close to their cores, reducing in this way the distance between computing and data. Yet, despite GPU’s unrivaled parallelization ability that ushers in exceptional computational throughput, the need for data movement still remains and sets a fundamental limit in both speed and energy efficiency.
Toward totally eradicating the constraints of data movement, recent developments in analog computing through memristive crossbar arrays11–13 follow an alternative approach, called in-memory computing. This scheme allows for certain DNN computational tasks (e.g., weighting) to be performed within the memory cell itself, seamlessly supporting multiplication operations without requiring any data transfer.14 The recent 64-core analog-in-memory compute (AiMC) research prototype of IBM15 and the commercial entry of Mythic’s AiMC engine have validated the energy benefits that can originate from in-memory computing compared to Von-Neumann architectures. These implementations employ computational memory devices, including resistive RAMs (RRAMs) and phase change materials (PCMs), where the application of a voltage results in a change of the material’s property, achieving in this way both data storing and computing. However, issues related to memory instability and finite resistance of the crossbar wires may lead to computational errors and crossbar size limitations, respectively, making it hard to reach the computational throughput and parallelization level of GPUs.12,14 Similar to in-memory computing, neuromorphic computing comprises an alternative non-Von-Neumann architecture that is inspired by the structure and function of the human brain, meaning that both memory and computing are governed by artificial neurons and synapses. Neuromorphic chips mostly employ spiking neural networks (SNNs) to emulate the behavior of biological neurons, which communicate through discrete electrical pulses called spikes. SNNs can process spatiotemporal information more efficiently and accurately than conventional neural networks16 as they respond to changes in the input data in real time. Additionally, they rely on asynchronous communication and event-driven computations, where, typically, only a small portion of the entire system is active at any given time, while the rest is idle, resulting to low-power operation.17 However, neuromorphic computing is not currently being used in real-word applications, and there are still a wide variety of challenges in both algorithmic and application development18 that need to be addressed toward outperforming conventional deep learning (DL) approaches. At the same time, the underlying electronic hardware in analog compute engines continues to rely heavily on complementary metal–oxide–semiconductor (CMOS) electronic transistors and interconnects, whose speed and energy efficiency are dictated by their size. Taking into account that transistor scaling has slowed down during the last decade, since we are approaching its fundamental physical size limits,19 there is no significant performance margin left to be gained. In parallel, the requirement for multiple connected neurons yields increased interconnect lengths in analog in-memory computing schemes that finally result in low line-rate operation in order to avoid an increased energy consumption. All this indicates that a radical departure from traditional electronic computing systems toward a novel computational hardware technology has to be realized in order to be able to fully reap the benefits of the architectural shift toward non-Von-Neumann layouts.
Along this direction, integrated photonics emerged as a promising candidate for the hardware implementation of DNNs; the analog nature of light is inherently compatible with analog compute principles, while low-energy and high-bandwidth connectivity is the natural advantage of optical wires. On top of that, photonics can offer multiple degrees of freedom, such as wavelength, phase, mode, and polarization, being suitable for parallelizing data processing through multiplexing techniques20,21 that have been traditionally employed in optical communication systems for transferring information at enormous data rates (>Tb/s). The constantly growing deployment of optical interconnects and their rapid penetration to smaller network segments has been also the driving force for the impressive advances witnessed in photonic integration and, particularly, in silicon photonics; silicon photonic integrated circuits (PICs) with thousands of photonic components can be fabricated in a single die nowadays,22 forming a highly promising technology landscape for optical information processing tasks at chip-scale. Nevertheless, compared with electronic systems that host billions of transistors, thousands of photonic components may not be sufficient to build a vast universal hardware engine for generic applications. Yet, the constant progress in the field of integrated optics coupled with the rapid advances in fabrication and packaging can eventually shape new horizons in this field. This has raised expectations for an integrated photonic neural network (PNN) platform that can cope with the massively growing computational needs of Deep Learning (DL) engines, where computational capacity requirements double every 4–6 months.23 In this realm, several PNN demonstrations have been proposed,24–41 employing light both for data transfer and computational functions and shaping a new roadmap for orders of magnitude higher computational and energy efficiencies than conventional electronic counterparts. At the same time, they highlighted a number of remaining challenges that have to be addressed at technology, architecture, and training level, designating a bidirectional interactive relationship between hardware and software: the photonic hardware substrate has to comply with existing DL models and architectures, but at the same time, the DL training algorithms have to adapt to the idiosyncrasy of the photonic hardware. Integrated neuromorphic photonic hardware extends along a pool of architectural and technology options, the main target being the deployment of highly scalable and energy efficient setups that are compatible with conventional DL training models and suitable to safeguard high accuracy performance. In parallel, the use of light in all its basic computational blocks brings inevitably a number of new physical and mathematical quantities in NN layouts,41,42 such as noise and multiplication between “noisy” matrices, as well as mathematical expressions for non-typical activation responses, which are not encountered in conventional DL training models employed in the digital world. This calls for an optics-informed DL training model library; the term “optics-informed” has been recently coined by Roumpos et al.42 in order to describe the hardware-aware characteristics of DL training models and declare their alignment along the nature of optical hardware since it takes into account the idiosyncrasy of light and photonic technology. However, despite the advances pursued in both the hardware and software segments, the complexity of photonic processing is still far behind electronics with respect to both their algorithmic and their hardware capabilities. Hence, the field of PNNs does not currently proceed along the mission of replacing conventional electronic-based AI engines but aims rather to engender applications where photonics can offer certain benefits over their electronic counterparts. This mainly expands along inference applications since inference comprises the most critical process in defining the power and computational resource requirements in certain applications, such as modern Natural Language Models (NLPs), where inference workloads are estimated to consume 25×–1386× higher power than training.43 Other deployment scenarios include latency-critical applications that are related to cyber-security in DCs,37 non-linearity compensation in fiber communication systems,39 acceleration of DNN’s matrix multiplication operations at 10 s of GHz frequency update rates,25 decentralization of the AI input layer from core AI processing for edge applications,44 and finally to provide solutions to non-linear optimization problems in, e.g., autonomous driving and robotics.40
In this Tutorial, we aim to provide a comprehensive understanding of the underlying mechanisms, technologies, and training models of PNNs, highlighting their distinctive advantages and addressing the remaining challenges when compared to conventional electronic approaches. This Tutorial forms the first attempt toward addressing the field of PNNs for DL applications within a hardware/software co-design and co-development framework: with the emphasis being on integrated PNN deployments, we define and describe the PNN fundamentals, taking into account both the underlying chip-scale neuromorphic photonic hardware and the necessary optics-informed DL training models. This paper is structured as follows: In Sec. II, we introduce the basic definitions and requirements for NN hardware, analyzing the basic NN building blocks (artificial neuron, NN models) as well as the main mathematical operations required for the hardware implementation of NNs, i.e., multiply and accumulate (MAC) and matrix-vector-multiplication (MVM) operations. The same section also provides an intuitive analysis on bit resolution and energy efficiency trade-offs of analog photonic circuits, discussing the advantages and opportunities of PNNs. In Sec. III, a review in the basic computational photonic hardware technologies is provided, presenting a summary of photonic MVM architectures and weight technologies in Sec. III and activations functions in Sec. IV. Finally, Sec. V is devoted to the challenges and requirements in the photonic DL training sector, providing a solid definition of optics-informed DL models and summarizing the relevant state-of-the-art techniques and demonstrations.
II. BASIC DEFINITIONS AND REQUIREMENTS FOR NEURAL NETWORK HARDWARE
Merging photonics with neuromorphic computing architectures requires a solid knowledge of the underlying NN architectures, building blocks, and mechanisms. The most basic definitions and requirements are briefly described below.
A. Artificial neuron
An artificial neuron comprises the main operation unit in a neural network, with the operation of the basic McCulloch–Pitts neuron model45 being mathematically described by , where y is the neuron output, φ is an activation (non-linear) function, xi is the ith element of the input vector x, wi is the weight factor for the input value xi, and b is a bias. The linear term ∑Wixi represents the weighted addition and is typically carried out by the so-called linear neuron part, which comprises (i) an array of axons, with every ith axon denoting the transmission line that provides a single xi × wi product, (ii) an array of synaptic weights, with every ith weight wi located at the ith axon, and (iii) a summation stage. The non-linear neuron part comprises the activation function φ, with rectified linear unit (ReLU), sigmoid, pooling, etc., being among the most widely employed activation functions in current DL applications.46
For a layer of M interconnected neurons, the output of these neurons can be expressed in vector form as , where x is an input vector with N elements, W is the N × M weight matrix, b is a bias vector with M elements, and y is a vector made of M outputs. Figure 1(a) depicts a schematic layout of a biological neuron that can be mathematically described via an artificial neuron shown in Fig. 1(b), where the dendrites correspond to the weight signals, nucleus corresponds to the summation and activation function, and axon terminals are responsible for providing the inputs to the next neuron, while Fig. 1(c) depicts the resulting layout when utilizing artificial neuron to structure a DNN with a single input layer, a single output layer, and one or more hidden layers.
B. Neural network models
Part of the unprecedented success of NNs in tackling complex computational problems can be attributed to the plethora of NN models, capable of uniquely synergizing several hundred or up to billions of artificial neurons into versatile computational building blocks. In this section, we will give an overview of several NN models based both on their popularity and success in resolving standardized benchmarking problems, as well as their compatibility with hardware implementation in silicon photonic platforms.
NN models can be broadly classified in different categories based on the following:
Data flow pattern. Considering the direction of the information flow, NN models can be grouped in two categories: In feed-forward NNs, the signals travel exclusively to one direction, usually from left to right, while in feed-back NNs, the signals travel in both directions, allowing neurons to receive data from neurons belonging to subsequent or even the same layer. Figures 2(a)–2(e) depicts five popular types of NN models, grouped based on their data flow in feed-forward and feed-back implementations, with the latter being mostly utilized for resolving temporal and ordinal workloads as the network effectively retains memory of the previous samples.
Interconnectivity. The interconnection density between neurons of subsequent layers or even the same layer can be used to classify NN models in dense and sparse implementations. Figure 2(a) depicts a typical DNN model, where each neuron of the first layer is connected to all the neurons of the subsequently layer, usually denoted as a fully-connected layout, while the neurons of the second layer are interconnected to only two neurons of the subsequent layer, corresponding to a sparse layout. While high-interconnectivity density allows the NN to extract more complex relationships between the input data, the cost associated with the increasing number of weights scaling with a O(N2) complexity for a NxN interconnectivity promotes the use of sparse models.
Structural Layout. Employing a specific layout can enhance neural network models with unique attributes. A typical example of such a model, specifically a single layer of a convolutional NN,47 is illustrated in Fig. 2(b). This architectural approach, widely employed in image recognition task due to its spatio-local feature extraction capabilities, promotes weight re-use and as such relaxes the computational requirements, through applying the same weight kernel, i.e., a set of weight values, across the input data values. Another typical NN layout, depicted in Fig. 2(c), is an NN autoencoder, a model associated with data encryption due to its data compression layout that effectively reduces the dimensionality of the input data in its central layers, boosts wide employment in non-linearity compensation in optical communications.42 Finally, Fig. 2(d) illustrates the most common feed-back NN model called recurrent, while Fig. 2(e) depicts a special type of recurrent NNs typically denoted as reservoir computing,48 where a fixed connectivity recurrent layer is placed between the input and the output layer. The relaxed training requirements, as only the output layer has to be trained along with the ease of constructing time-delayed reservoir circuitry in silicon photonics platforms, have led to impressive demonstrations in optical channel equalization applications.
C. MAC and MVM operations
Value representation and information density. Digital implementations use discrete values of physical variables, employing typically two discrete levels that are correlated with the upper and lower switching voltage of a transistor and are usually denoted as 0 and 1. On the other hand, analog computing employs values across the whole range of physical variables, allowing in this way for the representation of several equivalent bits of information at the same time unit. A direct consequence of this value representation form is the required noise robustness of the computational system that will be discussed in more detail in Subsection II D, especially for optical implementations.
Computational primitives. While digital computing is solely based on the mathematics and respective deployments of Boolean logic-based circuitry, analog computing can employ the physical laws of the underlying hardware, e.g., capacitors, resistors52 to implement a variety of mathematical operations, unlocking a quiver of functionalities described by the exploited physical phenomena.
Latency. Given the large number of devices required to implement a specific mathematical operation using Boolean logic, e.g., a digital computational building block implementing 8-bit parallel multiplication requires ∼3000 transistors.53 This forms a latency-critical computational path that is defined by the maximum register-to-register delay and effectively limits the maximum achievable operating frequency and, as such, the achieved latency.54 This has led to the adoption of multi-threading and multi-core setups for parallel processing in modern computing systems, investing in architectural innovations toward system acceleration. On the other hand, analog systems are inherently built as parallel computational systems, giving them a significant edge in latency critical tasks while requiring ∼500× fewer components,53 on average, than digital electronic circuits for multiplication operations.
These advantages, synergized with the primitives of photonic devices, have fueled the rise of optical MVM hardware, with an indicative example of an analog photonic dot product implementation given in Fig. 3(c). In this approach, the input and/or weight information is encoded in one of the underlying physical variables of the photonic system i.e., the amplitude, phase, polarization, or wavelength of a light beam, while the physical primitives of optical phenomena are utilized for the mathematical operations: in this particular example, loss experienced during the transmission of light via the weight-encoding physical system provides the multiplication operation, while interference of light waves is used for providing the summation mechanism. Harnessing the advantages of light-based systems, i.e., multiple axes of freedom for encoding information in time, space, and wavelength, low propagation loss, low electromagnetic interference, and high-bandwidth operation hold the credentials to surpass analog electronic deployments in large scale photonic accelerators.55 It is noteworthy, though, that both the electronic and photonic analog compute engines necessitate the use of Digital-to-Analog (DAC) and Analog-to-Digital (ADC) modules for interfacing NN input and output modules with the digital world.
D. Precision
Migrating MAC operations from digital circuitry, where high-precision (i.e., 16-, 32-, or 64-bit) floating point representations are utilized, to the analog domain, necessitates a basic understanding of the physical representation and energy-efficiency tradeoffs of analog photonic circuitry. Given the continuous nature of analog variables, as opposed to the usually two-level discretized variables in digital systems, representing high-precision numerical quantities in an analog system necessitates significantly higher signal-to-noise ratios (SNRs). This requirement shapes an optimal bit resolution/energy efficiency operational regime for analog photonic computing systems.56 In this subsection, we will discuss the precision limitations of analog photonic computing, outlining its optimized operational trade-offs in the shot-noise limited regime vs state-of-the-art digital MAC circuitry.
In this context, NNs are uniquely suited for analog computing, as empirical research has shown that they can operate effectively with both low precision and fixed-point representation with inference models working nearly just as well with 4–8 bits of precision in both activations and weights—sometimes even down to 1–2 bits.62 On top of that, bit precision in analog compute engines can be improved by incorporating in the NN training the idiosyncrasies and noise sources of the underlying photonic hardware, investing in this way in the so-called hardware-aware training or optics-informed DL models.31 Employing this approach, researchers have already showcased robust networks that can secure almost the same accuracy with noise-free digital platforms,63 while a more detailed discussion is included in Sec. V.
E. Technology requirements for energy and area efficiency
Technology . | Compute rate B (MAC/s) . | Static consumption (W) . | Efficiency (J/MAC) . |
---|---|---|---|
TO PS66 | 10–50 × 109 | 12 × 10−3 | 2.4–12 pJ/MAC |
Insulated TO PS66 | 10–50 × 109 | 4 × 10−3 | 0.4–2 pJ/MAC |
EAMs65 | 10–50 × 109 | 2–20 × 10−6 | 0.4–20 fJ MAC |
Non-volatile PCMs26 | 10–50 × 109 | 0 | 0 pJ/MAC |
- Higher than the accelerator’s noise energy. In this context, following the analysis of Subsection II D for the shot-noise limited optical power and considering an NxN neural layer, with (a) a power splitting ratio of N2, implying that we have to multiply the output power by N2 to compensate the input and column splitting stages, and (b) a digital precision loss of aprec = N, the shot-noise limited optical power can be calculated using the following equation, which forms actually a more detailed representation of the laser power calculated in Eq. (5), where, however, the digital precision loss and the compensation loss factor are also taken into account:which makes the constituent term of Eq. (9) to equal(14)assuming a responsivity R = 1A/W.(15)
- Sufficient to generate the minimum required electrical charge at the receiver that can drive the subsequent node of the next NN layer.67 With the photonic accelerator operating at 1550 nm and assuming a photodetector with Cd = 1 fF,68 a Ci = 200 aF, 1 μm wire with an interconnect capacitance of 200 aF/um,67 and a required output voltage of Vout = 0.5 V,69 the minimum optical power required can be calculated, following the same convention of N2 splitting loss and N digital precision loss,that concludes for the fourth term of the energy efficiency to(16)Here, it should be pointed out that this interconnect capacitance Ci suggests a monolithic integration approach or a very intimate proximity of the photonic chiplet to the respective electronic chiplet. More traditional integration approaches will enforce higher interconnect capacitances and significantly increase the required energy, with an interesting analysis provided in Ref. 70. Combining all the terms in a single efficiency equation, we can conclude to(17)This highlights that energy efficiency improves with the following:(18)
Increasing N, implying that the energy consumed for generating and receiving the input and output signal, respectively, is optimally utilized when the same input and output signals are shared along multiple matrix multiplication, or equivalently, neural operations. With current neuromorphic architectures being radix-limited by maximum emitted laser power,71 loss-optimized architectures are required for allowing high circuit scalability and harnessing the advantages of photonic implementations.
Increasing B, which has a predominant effect in reducing energy consumption especially when using high-power consumption weight nodes, i.e., currently widely employed thermo-optic heaters dominate the energy efficiency reaching up to ∼1 pJ/MAC.
Operating in an optimized bit resolution energy regime as highlighted in the fourth constituent of Eq. (18). As we can observe, the order of magnitude difference between the shot-noise limited and minimum switching energy contributions has a threshold point at around 4.5 bits, implying that a careful examination of the underlying technology blocks and an optimized operational regime can significantly improve the energy consumption.
III. INTEGRATED PHOTONIC MATRIX-VECTOR-MULTIPLY ARCHITECTURES
In this section, we solely focus on photonic matrix-vector-multiply architectures that could potentially be deployed in DL environments rather than in spiking or event-based computing paradigms. This has been motivated by the current challenges faced by SNNs, including difficulties in understanding underlying mechanisms and a lack of standardized benchmarks.18 In contrast, the established success of deep learning models results from years of research and the availability of extensive datasets and benchmarks, contributing to their widespread applicability and effectiveness.
A. Coherent MVM architectures
Herein, we initially investigate the architectural categories of integrated PNNs, and then, we delve deeper into their individual building blocks, providing the recent developments on the photonic weight technologies as well as the non-linear activation functions implementations. Depending on the mechanism of information encoding and the calculation of linear operations, integrated PNNs can be classified into three broad categories: coherent, incoherent, and spatial architectures.
Coherent architectures harness the effect of constructive and destructive interference for linear combination of the inputs in the domain of electrical field amplitudes, requiring just a single wavelength for calculating the neural network linear operations. The principle of operation of coherent architectures is pictorially represented in Fig. 7(a), while Figs. 7(b)–7(d) illustrate indicative coherent layouts that have been proposed in the literature and will be comprehensively analyzed in this Tutorial. The first linear neuron realized in this manner has been proposed in Ref. 29, with its core relying on the optical interference unit realized through cascaded MZIs in a singular value decomposition (SVD) arrangement,72 as per Fig. 8(a). The SVD approach assumes decomposition of the arbitrary weight matrix W to W = USV†, where U and V denote unitary matrices, with V† being the conjugate transpose of V and S being a diagonal matrix that carries the singular values of W. Therefore, this scheme rests upon the factorization of unitary matrices that in the photonic domain have mainly based on U(2) factorization techniques employing 2 × 2 MZIs.73
In this regime, back in 1994, Reck et al.74 proposed the first optical unitary matrix decomposition scheme, the so-called triangular mesh shown in Fig. 8(b), using 2 × 2 MZIs as the elementary building block, illustrated in Fig. 8(c). Recently, this layout has been optimized by Clements et al.,75 introducing the rectangular mesh of 2 × 2 MZIs, depicted in Fig. 8(d), that is more loss-balanced and error-tolerant design than Reck’s architecture. Both layouts necessitate N(N − 1)/2 variable beam splitters for implementing any N × N unitary matrix, requiring, also, the same number of programming steps for realizing the decomposition. Although these U(2)-based architectures rely on simple library of photonic components that facilitate their fabrication, they suffer from several drawbacks, with the most important being the fidelity degradation. Fidelity corresponds to the measurement of closeness between the experimentally obtained and the theoretically targeted matrix values, denoting a quantity that declares the accuracy in implementing a targeted matrix in the experimental domain. Fidelity degradation in the U(2)-based layouts originates from the differential path losses imposed by the non-ideal lossy optical components.76
On top of that, U(2)-based layouts cannot support any fidelity restoration mechanism without altering their architectural structure or sacrificing their universality. Transferring these layouts in an SVD scheme toward implementing arbitrary matrices, the above effects exacerbate as two concatenated unitary matrix layouts are required. In an attempt to counteract these issues, the authors in Ref. 77 proposed the universal generalized Mach–Zehnder interferometer (UGMZI)-based unitary architecture illustrated in Fig. 8(e) and introduced a novel U(N) unitary decomposition technique78 in the optical domain that migrates from the conventional U(2) factorization by employing N × N Generalized MZIs (GMZIs) as the elementary building block. GMZIs serve as N × N beam splitters,79,80 followed by N PSs with each N × N beam splitter comprising two N × N MMI couplers interconnected by N PS, as depicted in Fig. 8(f). This scheme eliminates the differential path losses, and hence, it can yield 100% fidelity performance by applying a simple fidelity restoration mechanism, which incorporates N variable optical attenuators at the inputs of the UGMZI. Yet, this architecture heavily relies on MMI couplers with a high number of ports in order to perform transformations on large unitary matrices, which are still a rather immature integrated circuit technology that is under development in current research fabrication attempts. Finally, the authors in Ref. 81 proposed the slimmed SVD-based PNN, where they have traded the universality for area and loss efficiency by eliminating one of the two unitary matrices, implying that they can implement only specific weight matrices.
Apart from SVD-based approaches, direct-element mapping architectures comprise also coherent layouts that employ a single wavelength and interference for calculating the linear operations. The mapping of the weight values to the underlying photonic fabric is bijective, meaning that each photonic node imprints a dedicated value of the targeted weight matrix without necessitating decomposition, minimizing this way the programming complexity. Figure 9 illustrates the first coherent direct-element mapping architecture,76 implemented in a crossbar (Xbar) layout. In order to support both positive and negative weight values, this architecture requires the use of two devices per weight—an attenuator for imprinting the weight magnitude, proportional to, and a PS for controlling the phase, i.e., the sign of the weight, sign(Wi), enforcing 0 phase shift in the case of positive and π phase shift in case of negative weights, resulting in . The weighted inputs are linearly combined in N:1 combiner stage, constituted from cascaded Y-junction combiners, yielding the output electrical field proportional to , which conceals the sign information in its phase. If compatibility with electrical non-linearities is needed, the sign information of the signal emerging from the Xbar output can be translated from its phase to its magnitude by introducing an optional bias branch, which sets a constant reference power level that allows for mapping the positive/negative output field above/below the bias, as experimentally demonstrated in Ref. 82. Xbar architecture, thanks to its loss-balanced configuration, can yield 100% fidelity performance, while its non-cascaded and one-to-one mapping connectivity significantly improves the phase-induced fidelity performance since the error is restricted only to a single matrix element. These benefits were experimentally verified in Refs. 83 and 84, employing a 4 × 4 silicon photonic Xbar with SiGe EAMs as computing cells, while the NN classification credentials of this architecture were experimentally validated in Refs. 24 and 25 using a 2:1 single-column Xbar layout that is capable to calculate the linear operations of the MNIST dataset at up to 50 GHz clock frequency with a classification accuracy of >95%. In an effort to exploit the full potential of the photonic platform, Xbar architecture can be equipped with wavelength division multiplexing (WDM) technology to further boost the throughput as has been proposed in Refs. 85 and 86, realizing multiple output vectors at a single timeslot. Although the Xbar layout seems currently to be the optimal architectural candidate for PNNs, it requires careful and precise effort during circuit design in order to synchronize the optical signals that travel through different paths and coherently recombine at the output. Hence, optimum performance of the Xbar necessitates the employment of equal length optical paths whenever coherent recombination is required, suggesting that the path-length difference has to be compensated during the photonic chip layouting.
Finally, a recent coherent demonstration in Ref. 87 exploits vertical-cavity surface-emitting lasers (VCSELs) for encoding, in i-time steps, both the input vector and weight matrix, as shown in Fig. 7(d). Using the injection locking mechanism between the deployed VCSELs, the phase coherency is retained over the entire circuit, allowing for the realization of the coherent amplitude addition at the interference stage of each timestep. Matrix-vector products are realized by the photoelectric multiplication process in homodyne detectors, while a switched integrator charged amplifier is employed for the accumulation of the individual i products. Despite its simplicity, this architecture requires precise phase control over the individual VCSELs toward retaining phase coherency over the entire circuit, raising stability and scalability issues.
B. Incoherent MVM architectures
Demarcating from coherent architectures, incoherent PNNs encode the NN parameters into different wavelengths and calculate the network linear operations by employing WDM technology principles and power addition. A pictorial representation of how incoherent architectures operate is given in Fig. 7(e), while some incoherent layouts that have been suggested in the literature and will be thoroughly examined in this Tutorial are illustrated in Figs. 7(f)–7(h). The first implementation that follows this approach has been proposed in Ref. 88, when a team from Princeton initially demonstrated the so-called broadcast-and-weight architecture and then elaborated in more detail in Ref. 89. Each input xi is imprinted at a designated wavelength λi, essentially making each channel λi a virtual axon of a linear neuron, while all N inputs (λs) are typically multiplexed together into a single waveguide when arriving to the linear neuron, as shown in Fig. 10. The main building block of this architecture is the microring resonator (MRR) bank, consisting of N MRRs that are embraced by two parallel waveguides and are responsible for enforcing channel-selective weighting values. Each MRR filter is designed such that its transfer function can be continuously tuned, ideally between the values of 0 and 1, achieving controlled attenuation of the signal’s power at the corresponding λi. The sign is encoded by exploiting path-diversity and balanced photodetection (BPD); assuming that an ai fraction of a signal at a certain wavelength exits via the THRU port of the respective MRR module and the remaining (1-ai) part gets forwarded to the DROP port, the subtraction of the respective photocurrents at the BPD yields the weighting value wi = 2ai − 1 for this specific signal, which can range between −1 and 1, given that ai ranges between 0 and 1. With all different wavelengths leaving through the same DROP and the same THRU port and entering the same BPD unit, the BPD output provides the total weighted sum of WDM inputs. This architecture allows for one-to-one mapping of the weighting values into the MRR weight bank alleviating the programming complexity, yet it comprises a rather challenging solution since it necessitates the simultaneous operation and precise control of various resonant devices, raising issues in its scalability credentials. An alternative incoherent architecture is proposed by the authors in Ref. 26, demonstrating a PNN that follows the photonic in-memory computing paradigm where the weighting cells are realized through PCM-based memories. This approach exploits the non-volatile characteristics of the PCM devices consuming, in principle, zero power consumption when inference operation is targeted, meaning that the weights of the NN do not have to be updated and, thus, are statically imprinted in the PCM weighting modules. This architecture utilizes an integrated frequency comb laser to imprint the multiple inputs of the NN, with each comb line corresponding to a dedicated NN input value. The multi-wavelength signals after the PCM-based weighting stage that follows the layout depicted in Fig. 7(h) are incoherently combined to a photodiode (PD) in order to produce the linear summation. Although this architecture minimizes the memory movement bottleneck, it requires (i) precise design to timely synchronize the multi-wavelength signals at each PD and (ii) broad wavelength spectrum of frequency comb laser for implementing large scale NNs. An additional incoherent architecture is proposed in Ref. 28 and illustrated in Fig. 11. The authors employ WDM input signals for imprinting the NN input vector, while the realization of the weight matrices is implemented via multiple semiconductor optical amplifiers (SOAs). They adopt the cross-connect switch principles used in optical communications for constructing the PNN and arrayed waveguide gratings (AWGs) for multiplexing/demultiplexing the signals as well as for reducing the out-of-band accumulated noise of the SOAs. Although it comprises a promising solution toward implementing large scale PNNs, the deployment of multiple SOAs, as single stage weighting elements, trades the scalability credentials for increased power consumption. Finally, an alternative to the coherent/incoherent architectures has been proposed in Ref. 32, where the authors encode the N pixels of the classification image (NN input values) directly to the grating couplers through optical collimators, while the weight information of each NN input is imprinted through a dedicated PIN-based optical attenuator. Each weighted input is launched to a PD, and the resulted photocurrents are combined to generate the linear weighted sum of the neurons. As opposed to the coherent and incoherent layouts, there is no requirement for the encoded signals to be in phase or in different wavelength, respectively, since every NN input is imprinted at a designated photonic waveguide/axon. This, however, necessitates multiple waveguides/axons for implementing a high-dimensional NN, imposing scalability limitations.
From all previous implementations, it becomes easily evident that the main challenges and limitations of integrated PNNs relate to their scalability and thence to the hardware encoding of the vast amount of NN parameters into a photonic chip. In this direction, the authors in Refs. 38, 90, and 91 introduced the optical tiled matrix multiplication (TMM) technique, shown in Fig. 12, that follows the principles of the general matrix multiply (GeMM) method adopted by modern digital AI engines92,93 and attempting to virtually increase the size of the PNN without fabricating large photonic circuits. The rationale behind this concept is the following: the weight matrix and the input vector of an NN is divided into smaller tiles, whose dimension is dictated by the available hardware neurons. The remaining tiles are unrolled in the time domain via time division multiplexing (TDM) and then are sequentially imprinted into the photonic hardware, allowing in this way for the calculation of matrix-multiplication operations of a NN layer whose dimension is higher than the one implemented on hardware. The resulting time-unfolded products, produced by the multiple tiles, need to be added together in order to form the final summation. For this reason, the authors in Refs. 44, 91, and 94 utilized a charge accumulation technique either electro-optically using a low-bandwidth photodetector or electrically via a low-pass RC filter. Besides accumulation, this implementation allows for power efficient and low-cost ADCs since it relaxes their sampling rate and bandwidth requirements. However, the employment of optical TMM and charge accumulation techniques in a PNN engender specific requirements that need to be addressed: (i) both input vector-imprinting and weight-encoding modulators have to operate at the same data rate and (ii) the number of time-unfolded products that will be accumulated is dictated by the deployed capacitance of the RC filter or the bandwidth of photodetector, implying that after a certain period, a capacitor voltage/photodetector power should be reset in order to store (e.g., to a local memory) the first set of accumulated summation. The same process is repeated until the calculation of the total linear operations of PNN.
Even when employing the proposed techniques, large NN language models, such as chat-GPT95 and Megatron,96 necessitate billions of trainable parameters, which are challenging to encode not only in silicon photonic hardware but also in current electronic computing engines. Therefore, these models are deployed on High Performance Computers (HPCs) incorporating thousands of interconnected GPUs and/or tensor processing units (TPUs), e.g., Megatron language model deploys 4400 A100 GPUs,96 with each single accelerator comprising hundreds or thousands of nodes. This architectural paradigm has been already transferred in analog electronic accelerators prototypes, with recent multi-core systems already expanding to higher than 50 cores15 and can act as a blueprint architectural approach for multi-core photonic accelerators. Interconnection of the constituent photonic cores can benefit from the recent breakthroughs in optical chip-to-chip communications projected to offer significant energy and latency saving compared to electronic counterparts97 while also paving the way for reduced opto-electronic (OE) conversions and even on switch-fabric workload accelerations.98 Finally, the development of commercially viable silicon photonic accelerators has to tackle both the well documented packaging challenges of deploying very large scale integrated photonics,99,100 along with photonic accelerator specific packaging and interconnect requirements.101 Fortunately, recent breakthroughs in large scale photonic circuity packaging highlight a feasible developmental roadmap capable of addressing the challenges of (i) laser source integration through employing either heterogeneous integration of III–V components via wafer bonding102 and micro transfer printing103 or via photonic wire bonding,104 (ii) photonic/electronic system-in-package (SiP) development, with prominent approaches, including monolithic integration in silicon photonics105 or mainstream electronic platforms97 and 3D integration,106 and (iii) photonic accelerator memory access, where either optical interconnectivity between memory and accelerator is promoted107 or the novel photonic-in-memory-computing paradigm26 with the weight matrix being non-volatile and as such significantly alleviating the memory-accelerator memory requirements.
IV. NEUROMORPHIC PHOTONIC HARDWARE TECHNOLOGY
A. Photonic weighting technologies
Delving deeper into the individual PNN building blocks, we provide an overview of photonic technologies that can be promising candidates toward the realization of the NN weight imprinting into an integrated platform. As discussed previously, most PNN demonstrations focused on the weight matrix implementation rather than the NN input vector since the number of weight values comprises the greatest contributing factor to the hardware encoding of the entire NN parameters. For example, assuming a fully connected NN with topology of 10:10:5, the number of input values is 10, while the total weight values is 150, and this difference becomes more pronounced as the NN dimensions/layers increase. Hence, the selection of the photonic weight technology becomes crucial as it implicitly indicates the size and energy efficiency of the PNN. The photonic weight technologies can be divided into two categories, depending on their volatile characteristics. Non-volatile devices can be used as memories by storing the NN weight values in a PNN, and this information can be retained by statically applying ultra-low or even zero electrical power. These devices can either use memristors heterogeneously integrated with photonic microring resonators108 or exploit physical phenomena such as phase change26 and ferroelectricity109 in order to store and retain the weight values. The employment of non-volatile memory elements is more suitable for equipping PNN inference engines, offering low-power weight encoding with high-precision, but, in turn, they impose challenges that are related to reconfiguration time, fabrication maturity, compactness, and scalability. For example, PCMs that are mostly based on GST-based compounds exhibit up to 5-bit resolution,110 but, in turn, their reconfiguration time is restricted to the sub-MHz regime while in most demonstrations operate via optical absorption, limiting their deployment in large scale circuits. Ferroelectric materials, such as Barium Titanate (BTO), have already validated its non-volatile credentials retaining its states over 10 h.109 However, to incorporate this device into a PNN, one aspect that still needs to be addressed and optimized is the footprint since the required PS length for achieving pi phase shift is at least 1 mm,109 rendering the implementation of a large scale PNN rather challenging. On the other hand, when training applications are targeted or the TMM technique has to be applied for executing a high dimension neural layer over a limited PNN hardware, volatile devices take the lead over non-volatile materials since they offer dynamic weight update. Various TO MZI or MRR27,29–31,111–113 devices have been proposed for weight data encoding due to their well-established and mature fabrication process as well as their high bit precision (up to 9-bit113), yet their reconfiguration time is limited to ms values.
Electro-optic devices, such as micro-electro-mechanical systems (MEMS),114,115 EAMs,36 semiconductor optical amplifiers (SOAs),28 ITO-based modulators,116 graphene-based phase shifters,117 and silicon p-i-n diode Mach–Zehnder Modulator (MZMs),118 have already been demonstrated and potentially perform weighting functions exhibiting reconfiguration times in the GHz regime, trading, however, their performance in bit precision.41 Therefore, the selection of the photonic weight technology heavily depends on the targeted NN application (inference, training) and its bit resolution requirements. Figure 13 puts in juxtaposition the power consumption and footprint of different photonic technology candidates for the realization of the weighting function for PNN implementations, highlighting also their speed capabilities/reconfiguration time.
B. Photonic activation functions
An indispensable part in the realization of an NN is the activation function, i.e., a non-linear function that is applied at the egress of the linear weighted summation. The non-linearity of the activation function allows the network to generalize better, converge faster, approximate non-linear relationships, as well as avoid local minima. Despite the relative relaxed requirements in the properties of the activation functions, i.e., a certain degree of non-linearity and differentiability across the employed range,119 DNN implementations have been dominated, due to their higher performance credentials, by the use of the ReLU,120 PreLU,121 and variations of the sigmoid transfer function, including tanh and the logistic sigmoid.122 This dominance has shaped photonic NN activation function circuitry objectives, targeting to converge to these specific electrical baseline functions’ performance at the highest possible bit rate as well as achieve a certain level of SNR at their output to safeguard the scalability of the neural circuitry. Previous implementations of non-linearity in photonic NNs have been streamlined across three basic axes: (i) The simplest approach relies on applying the non-linear activation in the electronic domain. This was achieved through offline implementation in a CPU, following the opto-electrical conversion of the vector-matrix-multiplication product,29 by chaining an ADC to a digital multiplier and finally to a DAC123 or by introducing non-linearity in the neuron’s egress through a specially designed ADC.94 Despite the simplicity and effectiveness of digitally applying the non-linear activation function, the related unavoidable digital conversion induces, in the best case, a latency of several clock-cycles for every layer of the NN that employs one.29,123 Transferring this induced latency to a photonic NN accelerator would significantly decrease the achieved computation capabilities and as such its total performance credentials. (ii) The hybrid electrical-optical approach that relies on a cascade of active photonic and/or electronic components, i.e., photodiode—amplifier-modulator laser, with non-linear behavior provided by the opto-electrical synergies, such as transimpedance amplifier (TIA), or by the non-linear behavior of the photonic components (e.g., modulators).124–129 The hybrid electrical-optical approaches provide a viable alternative to digitally applied activation functions, but, in turn, the induced noise and latency originating from the cascaded optical to electrical to optical conversions still may impose a non-negligible overhead to the performance of the photonic NN. (iii) The all-optical approach based on engineering the non-linearities of optical components to conclude to practical photonic activation functions. In this context, different mechanisms and materials have been investigated, including among others gain saturation in SOA130,131 absorption saturation,132,133 reverse absorption saturation in films of buckyballs (C60),133 PCM non-linear characteristics,134,135 SiGe hybrid structure in a microring resonator,136,137 and poled thin-lithium niobate (PPLN) nanophotonic waveguides.138 All optical approaches seem to hit the sweet spot, between applicability and function, allowing time-of-flight computation and negating the need of costly conversions.
Finally, a recent trend and probably the most promising for realizing a complete PNN comprises the development of programmability feature in both hybrid and all-optical approaches, where a single building block can realize multiple activation functions by modifying its operational conditions.126,128,129,133,135 These implementations have mainly relied on the different non-linear transfer functions obtained by the same component when altering its operational conditions through specific settings, e.g., DC bias voltage for a modulator, DC current for an SOA, gain of a TIA, input optical power and pulse duration for PCM, etc. Therefore, by enabling reconfigurability in PNNs can pave the way toward implementing different AI applications/tasks without requiring any modifications in the underlying hardware.
Yet, the programmability properties of the non-linear activation functions need to be combined with high-speed performance to comply with the frequency update rate of the execution of the linear part. Figure 14 provides an overview of the devices that have been proposed for the implementation of NN activation functions, classifying them according to their implementation (all-optical, electro-optical), their speed performance, and the number of activation functions that they can realize, while Table II summarizes the power consumption and area metrics of state-of-the-art activation function demonstrations.
References . | Power consumption (mW) . | Area (mm2) . |
---|---|---|
TIA + MZM129 | 425 | 7.13 |
TIA + non-linear cond + MZM128 | 400 | 0.625 |
SOA130 | 1640 | 9.1 |
EAM124 | 17 | N.A. |
Thin film LiNbO3138 | 135 × 10−3 | N.A. |
Saturable absorber133 | 40 × 103 | 11.76 × 103 |
MRR126 | 0.1 | 25 |
V. OPTICS-INFORMED DEEP LEARNING MODELS
Despite the significant energy and footprint advantages of analog photonic neuromorphic circuitry, its use for DL applications necessitates a unique software-hardware NN co-design and co-development approach for accounting for various factors that are absent in digital hardware and as such ignored in current digital electronic DL models.125 These include among others fabrication variations, optical bandwidth, optical noise, optical crosstalk, limited ER, and non-linear activation functions that deviate from the typical activation functions used in conventional DL models, with all of them acting effectively as performance degradation factors.139 In this context, significant research effort has been invested into incorporating the photonic-hardware idiosyncrasy in NN training models,140 engendering also a new photonic hardware-aware DL-framework. This reality has shaped a new framework for PNNs that should be eventually better defined as the NN field that combines neuromorphic photonic hardware with optics-informed DL training models using light for carrying out all constituent computations but at the same time using DL training models that are optimally adapted to the properties of light and the characteristics of the photonic hardware technology. The research field of hardware-aware DL training models designed and deployed for neuromorphic photonic hardware has led to the introduction of optics-informed DL models,31,37,141,142 a term that has been recently coined in Ref. 42, revealing a strong potential in matching and even outperforming digital NN layouts in certain applications.42
Optics-informed DL models have to embed all relevant noise and physical quantities that stem from the analog nature of light and the optical properties of the computational elements into the training processing. In order to ease the understanding of the noise sources and physical quantities that impact a photonic accelerator and related NN implementation challenges, Fig. 15(a) illustrates the implementation of a single neuron axon over photonic hardware, along with the dominant signal quality degradation mechanisms. The input neuron data xi is quantized prior being injected in a DAC, whose bit resolution for the tens of GSa/s sampling rates required for photonic neuromorphic computing ranges around 4–8 bits,143 i.e., being significantly lower than the 32-bit floating point numbers utilized in digital counterparts.
This disparity is exemplary illustrated in Fig. 15(b), with the input NN data and the DAC having a bit resolution of 8 and 2 bit, respectively, resulting into quantization errors denoted as Qerror. Followingly, the quantized electrical signal at the DAC’s egress is used to drive an optical modulator in order to imprint the information in the optical domain. In this case, the non-linearity and non-infinite ER of the photonic modulator will modify the incoming signal, with Fig. 15(c) indicatively illustrating the effect on the signal representation of limited ER. It should be pointed out that in this simple analysis, we assume a weight stationary layout and as such neglect the effect of weight noise. We also approximate the frequency response of the photonic axon, denoted as Tf as a low-pass filter, a valid assumption when considering the convolution of the constituent frequency responses of the modulator and the photodiode that are typically limited in the GHz range. The effect of this low-pass behavior is schematically captured in Fig. 15(d), showcasing the effect of limited bandwidth on the calculated weighted sum. Several noise-sources also degrade the SNR of the optical signal traversing the photonic neuron, including among others Relative Intensity Noise (RIN), shot noise, and thermal noise. Under the general valid assumption that the main noise contribution can be approximated as AWGN sources, we concatenate the noise profile of the photonic axon into a single noise factor, correlated with the standard deviation of the zero-mean AWGN added to the signal. Figure 15(e) illustrates the effect of random AWGN noise added on the neural data that has propagated through the photonic hardware. Finally, an ADC is utilized for interfacing the signal back to the digital domain, introducing again quantization error, as depicted in Fig. 15(f). Comparing the finally received digital signal at the ADC output with the original NN input digital signal can clearly indicate the significant differences that may translate into degraded NN performance when relying on conventional DL training models.
In this section, we begin by highlighting the challenges and opportunities of using photonic activation functions in NN implementations, followed by an in-depth analysis of the approach and related benefits of incorporating photonic noise, limited bandwidth, limited ER, and quantization in NN training. Finally, we provide a brief overview of related applications and discuss the potential of optics-informed DL models.
A. Training with photonic activation functions: Challenges and solutions
Motivated by the variance preserving assumption,147 novel initialization approaches, targeting photonic activation functions, analytically compute the optimal variance during the initialization.125 More advanced approaches propose activation agnostic methods applying auxiliary optimization tasks that allow initializing neural network parameters by taking into account the actual data distribution and the limited activation range of the employed transfer functions.151
B. Noise-aware training in optics-informed DL
C. Training with quantized and mixed precision neural networks
Other operations, such as ADC and DAC operations, have also been shown to affect the accuracy of photonic neural networks. However, considering these phenomena are during the training results in more robust representations and, in turn, in higher performance during the deployment. More specifically, photonic computing includes the employment of DAC and ADC conversions along with the parameter encoding, amplification, and processing devices, such as modulators, PDs, and amplifiers, which, inevitably, introduce degradation of the analog precision during inference, as each constituent introduces a relevant noise source that impacts the electro-optic link’s bit resolution properties. Thus, the noise introduced increases when higher line rates are applied, translating to lower bit resolution. Furthermore, being able to operate in lower precision networks during deployment can further improve the potential use of analog computing by increasing the computational rate of the developed accelerators while keeping energy consumption low.53,153
Typically, the degradation introduced to analog precision can be simulated through a quantization process that converts a continuous signal to a discrete one by mapping its continuous set to a finite set of discrete values. This can be achieved by rounding and truncating the values of the input signal. Despite the fact that quantization techniques are widely studied by the DL community,153–155 they generally target large CNNs containing a large number of surplus parameters with a minor contribution to the overall performance of the model.156,157 Furthermore, existing works in the DL community focus mainly on partially quantized models that ignore input and bias.154,158 These limitations, which are further exaggerated when high-slope photonic activations are used, dictate the use of different training paradigms that take into account the actual physical implementation.31 Indeed, neuromorphic photonics impose new challenges on the quantization of the DL model, requiring the appropriate adaptation of existing methodologies to the unique limitations of photonic substrates. Furthermore, the quantization scheme applied in neuromorphic photonics typically follows a very simple uniform quantization.57,159 This differs from the approaches traditionally used in trainable quantization schemes for DL models160 as well as mixed precision quantization.161
To this end, several proposed approaches deal with the limited precision requirements before models are deployed to hardware. Such approaches calibrate networks with limited precision requirements after training the models, namely, post-training quantization methods, offering improvements in contrast to applying the model directly to hardware without taking into account the limited precision components.161 Other approaches take into account the limited precision requirements during training, naming quantization-aware training methods.161,162 Later methods significantly exceed the performance of post-training approaches, eliminating or restricting performance degradation between the full precision and limited precision models.162
The authors in Ref. 142 proposed an activation-agnostic, photonic-compliant, and quantization-aware training framework that does not require additional modifications of the hardware during inference, significantly improving model performance at lower bit resolution. More specifically, they proposed to train the networks with quantized parameters by applying uniform quantization to all parameters involved during the forward pass, and consequently, the quantization error is accumulated and propagated through the network to the output and affects the employed loss function. In this way, the network is adjusted to lower-precision signals, making it more robust to reduced bit resolution during inference, significantly improving the model performance. To this end, every signal involved in the response of the i-th layer is first quantized in a specific floating range . Then, during the forward pass of the network, quantization error ∈ is injected to simulate the effect of rounding during the quantization, while during the backpropagation, the rounding is ignored and approximated with an identity function. A comprehensive mathematical analysis regarding the quantization training can be found in Refs. 161 and 162. Finally, more advanced approaches targeting novel dynamic precision architectures41,163 propose stochastic approaches to gradually reduce the precision of layers within a model, exploiting their position and tolerance to noise, based on theoretical indications and empirical evidence.164 More specifically, the stochastic mixed precision quantization-aware training scheme, which is proposed in Ref. 164, adjusts the bit resolutions among layers in a mixed precision manner based on the observed bit resolution distribution of the applied architectures and configurations. In this way, the authors are able to significantly reduce the inference execution times of the deployed NN.41
D. Applications
Applying the aforementioned methods allows us to employ PNNs where high frequencies with minimum energy consumption are required, utilizing the DL techniques in a whole new spectrum of applications. Such applications include network monitoring and optical signal transmission, where the high compute rates limit the application of existing accelerators. For example, neuromorphic photonics are capable of operating at very high frequencies and can be integrated on a backplane pipeline of a modern high-end switch, which makes them an excellent choice for challenging Distributed Denial of Service (DDoS) attacks detection applications, where high-speed and low-energy inference is required. More specifically, the authors in Refs. 37 and 165 build on the concept of a neuromorphic lookaside accelerator, targeting to perform real-time traffic inspection, searching for DDoS attack patterns during the reconnaissance attack phase, when the attacker tries to determine critical information about the target’s configuration. Before deploying a DDoS attack, a port scanning procedure is compiled to track open ports on a target machine. During this procedure, port scanning tools, such as Nmap, create synthetic traffic that can be captured and analyzed by the proposed network, capturing huge amounts of packages in the used computation rates of the modern high-end switches.
Another domain that can be potentially benefited from neuromorphic hardware is communications. Over the recent years, there is an increasing interest in employing DL in the communication domain,166 ranging from wireless167 to optical fiber communications,42 exploiting the robustness of ANNs to noise, especially when they take it into account during the training process. Such approaches design the communication system by carrying out the optimization in a single end-to-end process, including the transmitter, receiver, and communication channel, with the ultimate goal to achieve optimal end-to-end performance by acquiring a robust representation of the input message,42,168 introducing an end-to-end deep learning fiber communication transceiver design, emphasizing training by examining all optical activation schemes and respective limitations present in realistic demonstrations. They applied the data-driven noise-aware initialization method169 that is capable of initializing PNNs by taking into account the actual data distribution, noise sources, as well as the unique nature of photonic activation functions. They focused on training photonic architectures, which employ all optical activation schemes,130 by simulating their given transfer functions. This allows for reducing the effect of vanishing gradient phenomena as well as improving the ability of networks coupled with communication systems to withstand noise, e.g., due to the optical transmission link. As experimentally demonstrated, this method is significantly tolerant to the degradation occurred when easily saturated photonic activations are employed as well as significantly improve the signal reconstruction of the all-optical intensity modulation/direct detection (IM/DD) system.
VI. CONCLUSION
Conventional electronic computing architectures face many challenges due to the rapid growth of compute power, driven by the rise of AI and DNNs, calling for a new hardware computing paradigm that could overcome these limitations and being capable of sustaining this ceaseless compute expansion. In this Tutorial, prompted by the ever-increasing maturity of silicon photonics, we presented the feasibility of PNNs and their potential embodiment in future DL environments. First, we discussed the essential concepts and criteria for NN hardware, examining the fundamental components of NNs and their core mathematical operations. Then, we investigated the interdependence of analog bit precision and energy efficiency of photonic circuits, highlighting the benefits and challenges of PNNs over conventional approaches. Moreover, we reviewed the state-of-the-art PNN architectures, analyzing their perspectives with respect to MVM operation execution, weight technology selection, and activation function implementation. Finally, the recently introduced optics-informed DL training framework was presented, which comprises a novel software-hardware NN co-design approach that aims to significantly improve the NN accuracy performance by incorporating the photonic hardware idiosyncrasy into NN training.
ACKNOWLEDGMENTS
The work was, in part, funded by the EU Horizon projects PlasmoniAC (Grant No. 871391), SIPHO-G (Grant No. 101017194), and Gatepost (Grant No. 101120938).
AUTHOR DECLARATIONS
Conflict of Interest
The authors have no conflicts to disclose.
Author Contributions
A.T. and M.M.-P. contributed equally to this work.
Apostolos Tsakyridis: Conceptualization (equal); Formal analysis (equal); Investigation (equal); Methodology (equal); Writing – original draft (equal). Miltiadis Moralis-Pegios: Conceptualization (lead); Investigation (equal); Methodology (equal); Validation (equal); Writing – original draft (equal). George Giamougiannis: Conceptualization (equal); Methodology (equal). Manos Kirtas: Data curation (equal); Investigation (equal); Software (equal). Nikolaos Passalis: Data curation (equal); Investigation (equal); Software (equal). Anastasios Tefas: Conceptualization (equal); Methodology (equal); Supervision (equal). Nikos Pleros: Conceptualization (equal); Methodology (equal); Supervision (equal); Writing – review & editing (equal).
DATA AVAILABILITY
The data that support the findings of this study are available from the corresponding author upon reasonable request.
APPENDIX: DIGITAL–ANALOG-PRECISION LOSS
Summing up, we defined aprec as the analog–digital precision and illustrated two operational regimes:
aprec = 1, when we increase the output power of the laser source to compensate for the splitting loss by a factor of N. In this case, the output bit resolution reaches , which for only integer values of bit resolution can be simplified to and
aprec = N, where we keep the same bit precision at both the input and the output, trading off the decreased bit precision, as opposed to the full digital precision case, with lower input laser power by a factor of aprec = N.