Recent advances in neuromorphic computing have established a computational framework that removes the processor-memory bottleneck evident in traditional von Neumann computing. Moreover, contemporary photonic circuits have addressed the limitations of electrical computational platforms to offer energy-efficient and parallel interconnects independently of the distance. When employed as synaptic interconnects with reconfigurable photonic elements, they can offer an analog platform capable of arbitrary linear matrix operations, including multiply–accumulate operation and convolution at extremely high speed and energy efficiency. Both all-optical and optoelectronic nonlinear transfer functions have been investigated for realizing neurons with photonic signals. A number of research efforts have reported orders of magnitude improvements estimated for computational throughput and energy efficiency. Compared to biological neural systems, achieving high scalability and density is challenging for such photonic neuromorphic systems. Recently developed tensor-train-decomposition methods and three-dimensional photonic integration technologies can potentially address both algorithmic and architectural scalability. This tutorial covers architectures, technologies, learning algorithms, and benchmarking for photonic and optoelectronic neuromorphic computers.

## I. INTRODUCTION

Artificial Intelligence (AI) and Machine Learning (ML) have transformed our everyday lives—everything from scientific computing to shopping and entertainment. The intelligence of such artificial systems lies primarily in data centers or warehouse-scale computing systems and has been shown to surpass the ability of human brains in some tasks, including the highly complex game of Go. However, today’s data centers consume megawatts of power [Google’s AlphaGo utilized 1202 central processing units (CPUs) and 176 graphical processing units (GPUs)^{1}], and current deep neural network algorithms require labor-intensive hand labeling of large datasets. Furthermore, the early conceptualization of “a-machine” by Turing in 1936^{2} (also called the Turing machine) proved the existence of fundamental limitations on the power of mechanical computation, albeit with powerful mathematical models and algorithms using a processing unit (e.g., CPU) as is done today. In addition, modern computers utilize random-access memory instead of an infinite memory tape divided into discrete cells.

In his “First draft of a report on the EDVAC,” in 1945,^{3} John von Neumann articulated what is considered the first general-purpose computing architecture based on memory, processing units, and networks (interconnects). Fascinatingly, von Neumann utilized synapses, neurons, and neural networks in this 1945 report to explain his proposed architecture and then predicted its limitations—now called *the von Neumann bottleneck*^{3}—by stating that “*the main bottleneck of an automatic very high-speed computing device lies: At the memory*.” Because of this limitation, relatively simple tasks, such as learning and pattern recognition, require a large amount of data movement (including moving the weight values) between the processor and the memory (across the bottleneck). Thus, the energy efficiency and the throughput of such computing tasks are fundamentally limited in von Neumann computing as was already predicted in 1945.^{4}

Despite increases in computing speed and the development of memory hierarchies, a fundamental separation between memory and computation remains, limiting data processing speeds regardless of the total availability of memory resources. Neuromorphic computers, in contrast, perform computation through directed graphs that are much better suited for the collocation of computing units and memory. Such a model has persistent or non-volatile memory in the form of synaptic weights uniquely associated with each pair of nodes in the graph. This locality of information allows neuromorphic architectures to avoid the bottleneck between processing and memory entirely. Each node is an individual computing unit with its own dedicated memory such that multiple pieces of information can be processed completely asynchronously and in parallel much like the human brain.

A human brain recognizes features from partial and conflicting information at ∼20 W power levels.^{5} At each moment, the brain is bombarded with a vast amount of sensory information, but somehow, the brain makes sense of this data stream, even if it contains imperfect and inconsistent data elements, by extracting the forms of the spatiotemporal structure embedded in it. From this, it builds meaningful representations of objects, sounds, surface textures, and so forth through parallel distributed processing. In a human brain, each neuron may be connected to up to ∼10 000 other neurons, passing signals via as many as 164 × 10^{12} synaptic connections,^{6} equivalent by some estimates to a computer with a 1 × 10^{12} bit per second processor. The neurons communicate with each other with extremely high energy efficiency. For example, in Ref. 7, Attwell and Laughlin observed that the energetic cost of information transmission through synapses is extremely efficient at ∼20 500 ATP/bit, corresponding to 1.04 fJ/bit at 32 bit/s. In the nervous system, firing a spike costs a neuron 10^{6}–10^{7} ATP/bit^{7} or 50–500 fJ/bit, an amount of energy proportional to how far the spike must travel down the axon because the transmission medium is dispersive and lossy. Furthermore, dendrites of neurons contribute immensely to the energy efficiency and the density of computing in the brain by providing nano-/micro-scale neural networks inside the neuron itself, which is part of a larger neural network. The massively parallel yet hierarchical nature of learning and inference processes in the brain has been intriguing but not fully understood.

Is it possible to bring such brain-inspired capabilities to artificial machines with similar energy efficiency and scalability? Can we replicate the brain’s remarkable capabilities by constructing synapses and neurons using artificial materials and devices? There have been decades of efforts in this area of neuromorphic computing, and none have come close to demonstrating the full capability of the brain. Turing in 1950 proposed a test (now known as a Turing test) to replace the question “Can machines think?”^{8,9} Despite decades of efforts by many, even if a machine could get close to passing the Turing test, it is extremely unlikely or at least challenging to achieve the energy efficiency and the computing capacity in such small volume and weight as the brain. The seminal work carried out by Mead at Caltech in the late 1980s^{10} emphasized a million-fold improvement in power efficiency. The subsequent work of Boahen’s Neurogrid,^{11} Heidelberg’s BrainScaleS,^{12} IBM’s TrueNorth,^{13} Intel’s Loihi,^{14} Manchester’s SpiNNaker machine,^{15} Cauwenberghs’ Hierarchical Address Event Representation (HiAER) communications fabric,^{16} and Mitra and Wong’s N3XT^{17} all addressed far better energy-efficiency than conventional von Neumann computing for relatively simple example tasks.

There are challenges in scaling these electronic neuromorphic computing platforms to the very large scale. Electronic solutions typically include long electrical wires with large capacitance values, leading to high interconnect energy consumption. Their interconnect topologies are typically in four directions and require many repeaters for multi-hop connections to many other non-neighboring nodes. For instance, the TrueNorth chip runs at a slow clock speed of 1 kHz, communicates with energy efficiency at 2.3 pJ/bit with an additional 3 pJ/bit for every cm of transmission, and requires a 256 × 256 cross-bar network that selectively connects incoming neural spike events to outgoing neurons.^{13} The recently emerging nanoelectronic neuromorphic computing systems also suffer from similar communication challenges in achieving appreciable repeaterless distances, especially at high speeds.^{18} In the 1980s, optical neural network studies became a very active area of study in achieving massively parallel brain-like computing at the speed of light.^{19–24} However, the pioneer himself, Psaltis, declared in the 1990s that he was abandoning optical neuromorphic computing for two reasons: (1) lack of practical devices that can be integrated and (2) insufficient knowledge of complex neural networks. Fast forward to 2021, three decades later, we are now witnessing three major changes countering the two reasons for the abandonment. First, machine learning algorithms utilizing deep neural networks have advanced so much that an artificial machine with a single night of training can beat the human world champion of 33 years in the game of Go^{1}—Lee Sedol referred to AI as “*an entity that cannot be defeated*.” Second, the rate of increase in component integration in silicon photonics^{25–27} is now twice as fast as that of the electronic integration (electronic Moore’s law^{28}). Thus, we now find silicon photonic integrated circuits with ten thousand photonic components on a die manufactured on 300-mm silicon photonic wafers from several foundries. Third, while Moore’s law barely maintains its trend of continuing increases in transistor density—going from 5, 4, 3, and possibly down to 2 nm and below—the slowing of this trend is evident, and Dennard’s law,^{29,30} which governs energy efficiency, has already stalled since 2005. Hence, electronics alone cannot sustain the exponential increases of data processing, especially with the requirement that von Neumann computing architectures move data across a bottleneck. The natural conclusion from these three major changes points to photonic neuromorphic computing as the key solution to future computing.

Nonetheless, there are significant challenges in scaling analog photonic networks while maintaining high accuracy. Wetzstein *et al.*^{31} summarized this to be due to the following three main reasons: (1) the advantages (power and speed) of analog accelerators are useful only for very large networks, (2) the technology for the optoelectronic implementation of the nonlinear activation function was immature, and (3) the difficulty in controlling analog weights made it difficult to reliably control large optical networks. Hybrid or heterogeneously integrated photonic–electronic neural networks offer practical solutions to these challenges.

Many other reviews have been written on the topic of photonic neuromorphic computing systems. Some place heavier emphasis on a specific aspect of the system such as material choice^{32,33} or nonlinearity^{34} specific structures such as reservoir computers,^{35,36} or the relationship between photonic neuromorphic neural networks and machine learning,^{37} deep learning,^{38} or artificial intelligence.^{39} Others more broadly discuss interconnect technology, network topology, neuron design, and algorithm choices at differing levels of depth.^{40–44} This tutorial aims to concisely and comprehensively unify each of the aforementioned aspects of photonic neuromorphic design and cover them at their most fundamental level before describing how they relate to the computational abilities of the system; references to other reviews will be given for implementation and other details not fully addressed here. The tutorial is structured as follows: Sec. II A argues the rationale of photonic and optoelectronic neuromorphic computing. Section II B covers the general system architecture of the neuromorphic computer. Section II C details the individual building blocks, followed by the learning models in Sec. II D. Section III addresses the critical topic of achieving scalability in algorithm and physical systems. Section IV surveys benchmarking and casts the challenges in benchmarking such a nascent area of computing. Finally, Sec. V summarizes the tutorial by addressing future directions.

## II. TOWARD REALIZATION OF OPTOELECTRONIC AND PHOTONIC NEUROMORPHIC COMPUTING

### A. Rationale for optoelectronic and photonic neuromorphic computing

^{45}depicting the complex network of neurons and synapses. Each neuron consists of thousands of dendrites, a soma, and an axon with thousands of axon terminals. Each neuron connects with thousands (∼7000 on average

^{6}) of other neurons at the synapses interfacing the axon terminals of the upstream neurons (presynaptic neurons) to the downstream neurons (postsynaptic neurons). The plasticity of the synapses through the history of experiences allows learning (or training)—discussed more in Sec. II D. Hence, our efforts to construct a bio-derived or a bio-inspired (close-imitation or inspired-reconstruction of the brain functionality) neuromorphic computer start from the simple mathematical diagram of Fig. 2(b) of a (deep) neural network consisting of an array of neurons in each layer and weighted synaptic interconnections between the layers. The output of neurons at each layer [e.g., $xil\u22121$ at layer(

*l*− 1)] is related to the output of the neurons at the next layer ($xjl$ at layer

*l*) via the relationship

*θ*is the nonlinear transfer function. Figure 2(c) depicts this again from the perspective of a single neuron where the input from each dendrite receives synaptically weighted inputs that are summed at the soma and generates an output according to a nonlinear transfer function—like the Sigmoid activation function $\theta sjl$ shown in Fig. 2(d).

It is not a coincidence that the biological system chose to utilize spiking neural networks (SNNs) instead of non-spiking ones after millions of years of evolution. The energy efficiency of the brain is crucial; although a human brain represents only ∼2% of the body weight, it consumes ∼20% of the oxygen and calories. Information transfer and processing utilizing spikes—based on event-driven communication and processing in a massively parallel system—are orders of magnitude more energy-efficient than non-spiking counterparts that require constant energy consumption even when communication or computation is unnecessary.

Another important challenge is the implementation of high-throughput and scalable neuromorphic computers that maintain this energy efficiency. Even for bio-derived neuromorphic computing, it is difficult to exactly replicate the wet-electro-chemical systems of ion channels and ATP/ADP conversions. Commonly used electrical wires are too power-hungry and noisy due to electromagnetic impedance, electromagnetic interference, and Johnson thermal noise. For instance, IBM’s TrueNorth system^{13} included repeaters consuming 3 pJ/bit for every cm to overcome dispersion limitations and assure signal integrity. Similarly, Intel’s first iteration of the Loihi chip is organized into a grid of 128 neurocores communicating through a network-on-chip (NoC) with an energy cost of 3–4 pJ for each hop.^{14}

Even more serious energy limitations arise when large-scale synaptic networks need to be deployed utilizing electrical mesh networks whose capacitance and energy consumption scale quadratically. On the other hand, photonics do not suffer from the same types of limitations due to impedance, interference, thermal noise, and RC latency. Photonic meshes can achieve matrix multiplications in Figs. 2(b) and 2(c) just by propagating light through the photonic meshes, which can be made lossless in a unitary photonic mesh configuration. Photonic interconnects can achieve low energy (∼1 fJ/b),^{47} low loss (<0.1 dB/cm),^{48,49} parallel wavelengths (many wavelengths), and high speed (>10 Gb/s) transmission independently of the distance.

Nonetheless, it is difficult to fully compare photonic and electronic approaches without the availability of equivalently functional systems. However, the same challenge has been previously identified for comparisons between two or more fully electrical solutions.^{50} Nonetheless, this tutorial hopes to convince the reader that photonic approaches to neuromorphic computing can offer the following unique advantages compared to electronic counterparts:

Massive photonic parallelism achievable in wavelength, time, and space domains,

absence of electromagnetic interference,

extremely low noise (negligible thermal Johnson noise and possible shot-noise performance),

information transmission at 1 fJ/b independently of the distance,

matrix-multiplication achievable by simple propagation of light through the photonic mesh, in principle, with zero energy loss,

bidirectional photonic synapses and meshes achievable for forward/backward propagation training,

extremely low noise (shot-noise limited),

fast optical barrier synchronization, and

sparse processing overcoming the poor locality of data and information far beyond the reach of electronic neural networks.

On the other hand, photonic neuromorphic computing faces the following challenges:

Photonic components are relatively large (typical dimensions of wavelength/refractive index on the order of ∼1

*μ*m) compared to electronic components (∼10 nm). Therefore, wavelength- and time-domain multiplexing is necessary to achieve the density comparable to electronic and biologic neural systems.All-optical nonlinear transfer functions are difficult to achieve without relatively high optical power. Due to this reason, optoelectronic neural networks incorporating optical–electrical–optical (O/E/O) conversion have been proposed and demonstrated.

^{51}

Photonic neuromorphic computing almost always needs electronics for power distribution, control, and signaling. Section II B details the high-level architectural decisions of designing photonic neuromorphic computers and detail existing devices and methods for building optoelectronic and photonic neuromorphic hardware while highlighting these advantages and addressing the challenges.

### B. System architectures for photonic neural networks

#### 1. Spiking vs non-spiking photonic neural networks

As previously discussed, event-driven spiking neural networks are far more energy-efficient compared to non-spiking counterparts. Additionally, spiking units offer new ways to represent information that may be more natural for specific classes of computation, such as graph algorithms,^{52} quadratic programming,^{53} and other parallelizable computation algorithms.^{54} However, spiking networks have additional complexity in that they require mechanisms for integration over time. Neurons must integrate their inputs over a time window some order of magnitude greater than the received pulse widths to meaningfully process new aggregate information from their upstream neurons; otherwise, the activity of deeper layers in the network can merely encode a rough thresholding of activity from previous layers. Additional complexity in the inclusion of dendritic delays—which simulates the effect of spatially distributed networks—allows downstream neurons to encode and process information about the timing patterns of their upstream inputs and thus provides another dimension of processing to the network.

On the other hand, mathematical and experimental implementations of non-spiking artificial neural networks (ANNs) are much more accessible than those of spiking neural networks. Moreover, the choice of a non-spiking model can more easily leverage the many developments of traditional deep learning from traditional computing platforms and more naturally apply gradient-based learning rules (discussed more in Sec. II D) that have been well proven in many application spaces. Many non-spiking photonic matrix multipliers^{55–61} and photonic neural networks (PNNs)^{62–65} have been proposed or demonstrated. See Ref. 66 for a taxonomy of photonic neural network approaches and a more in-depth review of existing approaches.

#### 2. Out-of-plane vs in-plane photonic neural networks

*Out-of-Plane Photonic Neural Networks.* As introduced in Sec. I, the first optical neural networks were developed in the 1980s incorporating optical planes (pixels) with photonic signals propagating vertically. Such out-of-plane implementations of photonic neural networks are still in active research (a) because it can attempt to utilize many optically resolvable elements (pixels) simultaneously in parallel and (b) because Fourier optics and optical convolution can be achieved easily by incorporating lenses. The electronic architecture demonstrated in 1985 has been commercialized in the Optalysys system, where a PC has been used to provide the necessary gain, feedback, and thresholding indicated in Ref. 67. Interestingly, the optical feedback scheme utilizes O/E/O conversion consisting of arrays of photodetectors (PD) and light-emitting diodes (LEDs). Both schemes utilize electronic amplification to overcome optical diffractive losses and electronically incorporate the nonlinear transfer function (no all-optical neural transfer function).

More recently, out-of-plane photonic neural networks have been implemented in the form of Diffractive Deep Neural Networks (D^{2}NNs), which use a cascade of passive diffractive media to implement a synaptic strength matrix between layers in the network.^{68} A thin optical element is designed with variable thicknesses at the “resolution” of the network (number of neurons in the layer) and controls the complex-valued transmission and reflection coefficients at each point. Mathematically, each point is considered a secondary source for the incoming coherent light signal that acts as a neuron in a fully connected neural network layer. Weight matrices implemented by the network are fixed, with the diffractive medium being fabricated as a passive optical element by 3D printing or photolithography. Inputs to the network can be encoded on the amplitude or phase of incoming coherent light before the network output is measured by an array of photodetectors at the output plane [as in Fig. 3(a)]. The demonstration of the above D^{2}NN did not experimentally incorporate nonlinear optical neuronal transfer functions or synaptic reconfiguration. The optical losses per layer (51% average power attenuation per layer reported in Ref. 68) and relatively high optical intensity levels required to drive optical nonlinear transfer functions may limit the scalability and practicality of this method. However, this does not represent a fundamental limit of the technique and may be reduced in the future by improved diffractive surface design.

*In-Plane Photonic Neural Networks.* As opposed to the out-of-plane PNNs, in-plane photonic neural networks implement all interconnected photonic synapses and photonic (or optoelectronic) neurons on planar photonic integrated circuits and offer more solid realization especially when utilizing silicon photonic technologies. Despite the lack of lenses, as in the out-of-plane approach, unitary photonic mesh networks consisting of many (unitary) 2 × 2 optical couplers can perform arbitrary matrix operations, including convolution and Fourier transforms. Furthermore, reconfiguring the photonic mesh can be achieved through the individual 2 × 2 optical couplers, which can be considered a photonic synapse in the synaptic network.

Miller proposed a method to implement arbitrary weighted connections from optical inputs to a set of optical outputs. This method relies on a “universal linear optical component”^{69} comprising a network of 2 × 2 Mach–Zehnder interferometer blocks—connected in a mesh as illustrated in Fig. 3(b)—which was proven capable of implementing any linear transformation from its inputs to its outputs (i.e., from preceding neurons to subsequent layers).^{69,70} In addition, other reconfigurable photonic structures can also implement matrix transformations, such as crossbar networks and micro-ring resonator banks, discussed in Sec. II C. By incorporating nonlinearity in the form of photonic or optoelectronic neurons between each layer (as shown in Fig. 4), a multi-layer neural network can be constructed.

Aside from in-plane and out-of-plane networks, it is worth mentioning that there are examples of optical neural networks designed using a combination of optical fibers, large-scale laser sources, and various other off-the-shelf modulators and components that can be purchased from providers of telecommunication companies.^{71–76} Given that such fiber-based or free-space components can be easily moved, swapped, and otherwise manipulated, it is far easier to develop and prototype neural networks using these technologies and architectures. In fairness—and not to diminish work in this space as merely prototype networks—it is speculatively conceivable that such neural networks can be used to simultaneously communicate and process information collected over broad distances much faster than would be possible for any exclusively localized system, photonic or electric. Examples of such fiber-based optical networks include optical reservoir computers^{71–75} (discussed more in Subsection II B 3), which have long exceeded a data processing speed of >1 GB/s^{75} for such tasks as chaotic time series prediction (over 10^{7} points per second) with an error rate of about 10% (compared to contemporary electronic approaches at 1%). Rafayelyan *et al.*,^{73} in more recent work, reported processing speeds on the order of 10^{14} operations per second for multi-dimensional chaotic system prediction—compared to the 10^{15}–10^{17} operations per second possible on supercomputers for similar tasks.^{73} In these examples, laser and amplifier feedback is manipulated to substantially increase the parallelization, with neurons communicating on various parallel wavelengths or modes in the system.

Despite the computational efficiency and promise of out-of-plane and fiber-based approaches, the remainder of this tutorial focuses on the construction of in-plane, integrated photonic neural networks that more closely approximate the physical scales of biological systems.

#### 3. Network topology

The physical structure or topology of the neural network can determine the possible transformations of data between successive layers or ensembles of neurons. Various neural network structures exist, with some derived from biological connectivity patterns and others deduced from the functions intended to be computed. In the design of a neuromorphic computer, the topology of networks supported depends on the capabilities of the hardware structures employed (discussed more in Sec. II C) and the learning rules most suitable for a given application. Neural network topologies are often divided between feedforward and recurrent approaches though brain-inspired structures typically fall into the latter category. Figure 5 summarizes the common topological structures found in neuromorphic computing.

**Feedforward neural networks** are the simplest network topology to implement and are useful in situations with clear mappings between input and output data as in the prevalent MNIST handwriting classification task. As shown in Fig. 5(a), feedforward refers to the flow of information exclusively from input to output. In general, restricting the flow of information allows the network to be trained more easily by backpropagation and equivalent supervised learning algorithms (discussed more in Sec. II D). Often, synaptic interconnections are fully or densely connected, meaning that each sending neuron is connected to each receiving neuron. This all-to-all pattern of connectivity provides the most flexibility for learnable patterns though at the cost of increased parameters (synaptic weight strengths) and computation for training those parameters. Various photonic structures can be used to implement these weighted connections based on the passive propagation of light—for example, phase-change materials (PCMs), Mach-Zehnder Interferometers (MZIs), or micro-ring resonators (MRRs) (discussed more in Sec. II C 1).

Fully connected feedforward networks require many parameters, so multiple strategies have been developed in traditional ANNs to reduce the number of parameters. **Convolutional neural networks** (CNNs) are the most popular remedy, which combines weight sharing and sparse connectivity to perform pattern recognition with far fewer weights than fully connected networks. As shown in Fig. 5(b), a small weight pattern (called a kernel) is swept across different positions of the input layer such that the subsequent layer reflects the strength of that pattern at each position. Photonic neural networks can take advantage of wavelength-division multiplexing (WDM)^{60,65,77} and time-division multiplexing (TDM)^{78} in addition to space-division-multiplexing (SDM) and other forms of parallelism^{58} to repeatedly apply the same kernel over multiple positions of an input vector.

**Recurrent neural networks** deviate from this exclusively “forward” propagation of information and incorporate cyclical and lateral pathways—see Fig. 5(c)—with varying degrees of connectivity. The broader range of topologies provides additional mechanisms for information processing in addition to the input–output mappings of feedforward networks. There are various forms of recurrent networks that appear in neuromorphic computing that use recurrent connections to add a persistent state (as in working memory) and dynamical properties to the behavior of the network.

In **reservoir computing**, a fixed (non-reconfigurable) recurrent neural network is sandwiched between two feedforward layers, as shown in Fig. 5(d), and only the output feedforward layers are reconfigured. The recurrent part of the network is called the “reservoir” and is defined with lateral connections (between neurons in the same layer or ensemble) of random strengths. The first feedforward layer also contains randomly selected weights and provides input to the reservoir, while the final feedforward layer is trained to “read out” the activity of the reservoir.^{79} The random flow of activity through the reservoir behaves like a dynamical system and enables the learning of temporal dynamics without the complicated training schemes required to train an entire network in other recurrent structures.^{80} To summarize, the reservoir is often said to transform the input from a temporal representation into a higher dimensional spatial representation that a linear classifier or predictor can more easily interpret; this behavior has been useful in applications such as digital signal processing, speech recognition, and general modeling of dynamical systems.^{81}

Photonic and optical implementations of reservoir computing vary in architecture, modulation, and signal generation. For example, Duport *et al.*^{82} reported a fully analog variant of a popular optical reservoir computing system; it represents the neural activity of the reservoir as the modulated intensity of an external laser. Each neuron's activity is time-multiplexed on a long delay line (optical fiber spool) with a period roughly equal to the round-trip time of the line (∼8.4 *µ*s). Other groups have demonstrated reservoir computing on CMOS-compatible integrated silicon photonics chips. Vandoorne *et al.*,^{83} for example, experimentally demonstrated a passive chip that uses only waveguides, splitters, and combiners, while the nonlinearity of the network is handled at signal detection and in the “read out” layer of the network. This approach was evaluated with multiple 2-bit binary logical tasks, with an error rate as low as 10^{−4} for the exclusive or (XOR) logical operation.

**Winner-Take-All (WTA) networks** are a biologically inspired topology in which recurrent or lateral inhibitory connections between excitatory neurons and inhibitory neurons—depicted in Fig. 5(e)—enforce a limit on the total activity of the layer. Strengths of incoming excitatory connections, lateral inhibitory connections, and bias currents are balanced to create the desired selectivity (softmax-like or hardmax-like transformation) of incoming information, much like that seen in the feature maps of the cerebral cortex.^{84}

Zhang *et al.*^{85} recently demonstrated a simulated WTA mechanism using the inhibitory dynamics of vertical-cavity surface-emitting lasers with saturable absorption (VCSEL-SA). Each VCSEL-SA in the circuit acts as a spiking neuron whose output is interpreted from a spike of intensity in the X-polarization (XP) mode; the Y-polarization (YP) mode is sent to the competing neuron (VSCEL-SA) and induces an YP spike that pushes the neuron into a refractory period that temporarily prevents XP spikes from being generated. Zhang *et al.* also used simulations to show that this bio-inspired mechanism can be used to implement the max-pooling operation required by many CNNs found in traditional machine learning. Max-pooling layers have been shown to increase the accuracy of deep CNNs by a factor of nearly three^{86} without the need for additional learnable parameters (synaptic weights).

**Hopfield networks**are another common recurrent network topology with well-defined dynamics; they are formed by selecting excitatory and inhibitory weights between neurons such that specific states (patterns of firing) are stable, while others are unstable—see Fig. 5(f). This connectivity pattern forms a simple content-based memory such that partial or distorted input patterns similar to a “memorized” state will evolve toward the memorized stable state—effectively completing the memory.

^{87}Marquez

*et al.*

^{88}used a thermo-optically tuned micro-ring resonator bank (described in Sec. II C 1) to experimentally demonstrate the pattern reconstruction capability of the Hopfield network for three small 4 × 4 patterns. A flattened memory pattern,

*x*, was stored (in an off-chip computer) in the network by calculating the recurrent weights according to

*W*

_{ii}= 0 is imposed to prevent neurons from firing all the time. The micro-ring resonators were tuned to apply the modulation of

*x*according to

*W*one column and row at a time. Components of a vector in

*x*were represented by the intensity of light at multiple wavelength channels on the same input waveguide. An offline computer was used to interpret results (as opposed to on-chip neurons) though the demonstration provides proof-of-concept for Hopfield networks implemented on micro-ring resonator banks.

### C. Building blocks of photonic neuromorphic computing systems

As discussed in Sec. II B 2, PNNs consist of an interconnection network analogous to the axonal and dendritic projections of the neuron and a photonic or optoelectronic nonlinearity that corresponds to the excitability behavior of the neuron. Table I summarizes biological neural network components and examples of equivalent optical and electro-optic device components that emulate the biological component mechanism.

Biological component . | Biological constituent . | Biological function . | Equivalent photonic component . | Equivalent photonic function . |
---|---|---|---|---|

Synapse | Presynaptic terminal | Electrical signal (action potential) conversion to chemical signal (neurotransmitter release) | Photonic synapses or photonic mesh | Photonic couplers with variable coupling strengths to reflect synaptic weights |

Postsynaptic terminal | Receives neurotransmitters at the receptors | |||

Neuron | Dendrites | Spatiotemporal dendrite computing summation | All-optical or optoelectronic neurons | Photonic dendrites or input optical couplers (with fan-in) |

Soma | Integration and spike generation | Photonic somas: nonlinear transfer function achieved by all-optical or optoelectronic devices | ||

Axon and axon terminals | Signal transmission | Photonic axon and axon terminals: optical waveguides and output optical couplers (with fan-out) |

Biological component . | Biological constituent . | Biological function . | Equivalent photonic component . | Equivalent photonic function . |
---|---|---|---|---|

Synapse | Presynaptic terminal | Electrical signal (action potential) conversion to chemical signal (neurotransmitter release) | Photonic synapses or photonic mesh | Photonic couplers with variable coupling strengths to reflect synaptic weights |

Postsynaptic terminal | Receives neurotransmitters at the receptors | |||

Neuron | Dendrites | Spatiotemporal dendrite computing summation | All-optical or optoelectronic neurons | Photonic dendrites or input optical couplers (with fan-in) |

Soma | Integration and spike generation | Photonic somas: nonlinear transfer function achieved by all-optical or optoelectronic devices | ||

Axon and axon terminals | Signal transmission | Photonic axon and axon terminals: optical waveguides and output optical couplers (with fan-out) |

Because biological neurons connect to thousands of other neurons via thousands of synapses, dendrites, and axons, it is difficult for bio-derived or bio-inspired artificial neural networks to achieve equivalent levels of connectivity. Biological systems can more easily achieve such high degrees of interconnection because of synapses' extremely small (∼10 nm) size and even smaller neurotransmitters and receptors. In addition, the unique electro-chemical dynamics allow tree-structure branching of dendrites and axon terminals to achieve the broadcast (at the axon terminals) and sum (at the dendrites) of the neurotransmitter signals. While, in principle, possible to construct such a synaptic network for each neuron, it is simpler to construct a mathematically equivalent mesh of synaptic interconnects, such as a crossbar, a mesh of 2 × 2 couplers, or other parallelized structures.

Electronic neuromorphic computers typically consist of electronic crossbars with memristive synapses at each crosspoint (*N*^{2} synapses for an *N* × *N* crossbar) and electronic neurons at each end. Likewise, photonic neural networks comprise photonic synaptic meshes consisting of photonic couplers and photonic or optoelectronic neurons that provide nonlinearity. In both cases, the number of photonic synaptic couplers scales as O(*N*^{2}) for an *N* × *N* neural network. Unlike biological synaptic interconnections, the synaptic couplers in these meshes do not directly and individually alter the corresponding weight values. Instead, the collections of these photonic synaptic coupling coefficients collectively apply the weight matrix values *w*_{ij} between the *i*th presynaptic neuron and the *j*th post-synaptic neuron.

#### 1. Forming reconfigurable optical synapses

Photonic matrix multipliers passively couple light from a set of N input ports to a set of M output ports according to a unitary weight matrix *U*. It is possible to remove the unitary restriction by the inclusion of active gain media. However, due to the preference for energy efficiency in neuromorphic computing, the remainder of this section assumes that passive weighting is sufficient for computation and all signal gains (if any) are handled by the neuron nonlinearity. The amount of light coupled from one input to one output is determined by one or more reconfigurable photonic elements that form the optical synapse. Manipulation of one or more material properties allows for the reconfiguration of these elements.

^{89}to produce a change in the optical length and thus modulate the phase shift of a signal due to propagation through the material. The former two are considered volatile reconfiguration mechanisms and require constant external biasing or power supply, while the latter two are non-volatile and persist after the power supply is removed. The application of these effects can result in the signal phase change, Δ

*ϕ*, that is proportional to the refractive index change Δ

*n*and the propagation constant,

*λ*

_{0}is the free-space wavelength of the signal.

^{90–93}The change in the refractive index can then be calculated as

_{1}, is often written as $dndT$ and given in units of inverse Kelvin (1/K). In integrated photonics, the behavior of structures can be simplified in terms of an effective index that depends on the refractive index of two or more materials. In such a case, the thermo-optic coefficient can be written according to the sum of contributions from each index. For example, in a waveguide where the effective index is dependent on the core index

*n*

_{core}and cladding index

*n*

_{clad}, the thermo-optic coefficient for the effective index can be written as

^{94}

The electro-optical tuning of the refractive index requires materials with significant electrical-field induced optical index changes through the Pockels effect, the Kerr effect, field-induced carrier density changes, or other mechanisms. The magnitude of these index changes varies considerably for differing materials.^{95} The relationship between the electric field and changes in the index is more complex than for thermo-optic effects and ranges from simple linear relationships in the case of the Pockels effect to more complicated charge carrier distribution through the material for carrier effects. Photo-ionic effects are similar to electro-optical effects except that applied electrical fields can physically displace ions to drive them in or out of waveguide materials (e.g., polymers) to semi-permanently change the optical index.^{96,97}

Optical Phase Change Materials (PCMs) achieve phase changes from one material structure to another (e.g., crystalline to amorphous and vice versa) often by external heating—rapid heating to high temperature and cooling vs slow heating and cooling—to induce changes in the refractive index and loss. Heating is often achieved electrically by incorporating pulsed Ohmic heaters as described previously though it is also possible to achieve such heating by optical pulses themselves, thus realizing the “all-optical” reconfiguration. However, due to the typical optical power levels required for such an optical reconfiguration of PCMs, optically tuning the PCM can be restrictive compared to other PCM reconfiguration approaches, such as electronic or thermal reconfiguration.^{98} A non-volatile synaptic reconfiguration achievable by PCMs and photonic–ionic materials^{89} is attractive because no static power consumption is required to maintain the induced changes in material properties.

*κ*and given in units of inverse length as the amount of coupling between the waveguides depends on the length of the interaction. Mode coupling between waveguides can be derived using coupled-mode theory; for two rectangular waveguides of the same core index,

*n*

_{1}, and shared cladding index,

*n*

_{0}, the coupling coefficient can be shown to obey the following proportions:

*a*is the width of the waveguides in the plane of coupling [see Fig. 6(a)]. Assuming no loss, we can describe coupling in a two-arm coupler—as shown in Fig. 6(b)—from a single incident wave in one arm to the two output arms according to the following (see Ref. 99 for more details on the derivation and nature of waveguide coupling):

*l*is the length of interaction between the two waveguides and

*A*

_{1}is the normalized incident field amplitude such that the input power is $P=A12$ and likewise for output fields

*B*

_{1}and

*B*

_{2}. It is important to note that the coupling coefficient is wavelength dependent, and if multiple wavelengths are used, an appropriate selection of the coupling length must be determined to allow each wavelength to follow the desired coupling between ports.

**Mach–Zehnder Interferometers (MZIs),**as depicted in the dashed rectangle of Fig. 7(b), are 2 × 2 reconfigurable photonic couplers that make use of two pairs of phase shifters and bidirectional couplers to implement a 2 × 2 unitary weight matrix,

*U*. If we represent the input as a vector, $A\u20d7$, whose elements correspond to the normalized incident field amplitudes, then the normalized field amplitudes at the output are given by $B\u20d7=UA\u20d7$. The pair of phase shifters can be arranged on any two arms (straight waveguide regions) of the MZI though the configuration shown in Fig. 7(b) allows for one of the phase shifters to control the relative phase of the two input signal components in each output. Assuming coherent inputs, perfect 50:50 couplers, and two phase shifters,

*φ*and

*θ*, arranged as shown in Fig. 7(b), the output amplitudes can be described by the following unitary matrix multiplication:

**Micro-Ring Resonators (MRRs)**are another reconfigurable technology that can be arranged to compute a matrix multiplication. In contrast to the MZI unit, MRR units are often used with wavelength-division multiplexing schemes to implement a “broadcast and weight” architecture.

^{100}Under this scheme, input vectors are encoded as the modulated light intensities of multiple wavelength channels, while each MRR unit acts as a tunable filter to selectively apply attenuation to a specific input wavelength. The micro-ring itself is a waveguide in the shape of a circle placed within an evanescent coupling distance of one or more straight waveguides [see in Fig. 7(c)]. For matrix multiplication, it is typical to use two waveguides such that the intensity at one of the two possible output ports—often called through and drop ports—can be modulated according to the desired multiplication by attenuation. These rings form a resonant cavity though other closed-loop waveguide paths can also form the MRR cavity. The length of the path (e.g., circumference in the case of a ring) determines the resonant condition and is tuned by incorporating a phase shifter. Rather than defining the usual coupling coefficient per unit length, analyzing the response of a ring resonator is usually done by defining the power splitting ratios, $kt2$ and $rt2$, which are known as the cross-coupling and self-coupling coefficients and correspond to the amplitude of input power coupled to the ring or through port, respectively. We can also define equivalent coupling coefficients $kd2$ and $rd2$ for the interactions between the waveguide and resonant cavity on the opposing drop waveguide. Assuming no coupling loss (where

*k*

^{2}+

*r*

^{2}= 1 for both coupling regions) or attenuation in the waveguide, we can calculate the power transmission to each port as

*λ*

_{0}is the free-space wavelength of the incoming signal. We can see that a resonant condition occurs for any inputs where an integer number of half wavelengths can fit in the cavity (see Ref. 101 for the derivation and review of other MRR properties),

Tuning the resonant frequency and coupling coefficients allows the modulation of the transmission to each port.^{102} It should be noted that, in contrast to the MZI unit, there will be a loss in transmission to the unused output port of the MRR (except for the case of total transmission at the desired wavelength channel). As such, the multiplication is not unitary as in the case of the MZI. However, as discussed in Sec. II C 2, a matrix multiplier can still be constructed from this unit by assembly into banks.

#### 2. Assembling photonic synaptic meshes

Using the above reconfigurable photonic elements provides many ways to construct a photonic matrix multiplier of a given number of input and output dimensions. Each method has a differing number of tunable elements required and different design considerations. These methods can be categorized into three prevalent technologies: cross-bar networks, MZI meshes, and MRR banks.

**Cross-bar networks**, as shown in Fig. 7(a), are the simplest approach, typically aligning incoming signals along one direction (i.e., east-west) and outgoing signals along the other (north-south). Reconfigurable materials, such as PCMs or optical memristive materials, allow incoming light to be coupled into the output waveguides according to synaptic strength. MRRs can also be used to couple light from input to output ports; however, this has not been demonstrated for matrix multiplication to our knowledge. Feldmann *et al.*^{60} demonstrated a PCM crossbar for parallel matrix multiplication in a convolutional network and reported 10^{12} MAC operations per second for a CNN accuracy of 95.3% compared to 96.1% for the equivalent CNN on a traditional computer. The crossbar networks consist of N input and M outputs with N × M connections, allowing for all-to-all connectivity at the cost of N × M reconfigurable coupling elements. Crossbar networks can implement rectangular or square matrices but require careful design to ensure that crossings further from the input receive sufficient optical power.

**MZI meshes** are another example of a photonic matrix multiplier and use collections of MZIs as an optical linear unit (OLU) that can perform calculations as a facet of their respective transfer matrix, as shown in Fig. 7(b). Utilizing this structure for the OLU, one can build an N × N arbitrary unitary matrix that consists of MZIs arranged in various mesh topologies, of which the most common are triangular,^{69} rectangular,^{103} and diamond^{104} as depicted in Figs. 8(a)–8(c), respectively. Gu *et al.* also presented a more complex butterfly topology that utilizes waveguide crossings and reduces the total number of MZI units in comparison to the former three topologies; see Ref. 105 for more details. For the former three topologies, the total number of MZI units for an NxN matrix is exactly *N*(*N* − 1)/2 though each MZI is composed of two reconfigurable elements for a total of *N*(*N* − 1) controllable parameters.

The rectangular mesh, simply put, connects MZI units side-by-side, as shown in Fig. 8(b). The upper arm of the previous MZI unit connects to the lower arm of the next MZI unit. The advantage of the rectangular mesh is that its compact arrangement has the minimum optical depth among the configurations mentioned.^{103} A non-square matrix must be formed by constructing a square matrix with the largest dimension and leaving the additional input or output ports unused. The butterfly mesh is based on the structure of the rectangular mesh with some MZI units pruned to reduce the number of MZI units needed at the cost of some reconfigurability (i.e., not all unitary matrices can be represented).^{105} Triangular meshes follow a similar connection rule to the rectangular mesh but start with only the two bottom ports and increase the coupling to additional ports in a diagonal line as depicted in Fig. 8(a). Triangular meshes require a higher optical depth and more chip space but support self-configuration mechanisms as demonstrated in Ref. 69 with the same number of parameters as the rectangular mesh. Diamond meshes are a modified version of the triangular mesh. It adds (*N* − 1)(*N* − 2)/2 additional MZI units that vertically mirror the shape of a triangular mesh as seen in Fig. 8(c). Shokraneh *et al.*^{104} showed that this symmetric topology can provide additional degrees of freedom for weight matrix optimization in the backpropagation training process while also improving the fault tolerance due to errors in fabrication.

**MRR banks**, as previously mentioned, take advantage of WDM to broadcast spiking signals widely and weight them with selective filters at the receiving neuron.^{106} Banks of MRRs are formed as shown in Fig. 7(c) by aligning them along a shared pair of waveguides while varying the radius of each ring sufficiently to avoid wavelength collision. Each receiving neuron has its own dedicated MRR bank to implement the incoming synaptic strengths of each wavelength before a balanced detector can measure the overall incoming signal intensity. The number of MRRs in each bank matches the number of sending neurons in the network, while the number of banks matches the number of receiving neurons. This means that a total of *N* × *M* MRRs will be needed to implement a fully connected network between a layer of *N* sending neurons and *M* receiving neurons.

#### 3. Photonic and optoelectronic nonlinear neurons

After receiving sufficiently strong stimuli, biological neurons emit electrical pulses known as *action potentials* or *spikes*. Encoding of information in the form of spike timing (*temporal coding*) or the spike rate (*rate coding*) has been a subject of active research. In designing nanophotonic spiking neural networks, the three essential elements—the neuron, the synapses, and the coding scheme—should be designed together to have the following attributes:^{51}

*weighted addition*: the ability to sum weighted inputs,*integration*: the ability to integrate the weighted sum over time,*thresholding*: the ability to decide whether or not to send a spike (all-or-none),*reset*: the ability to have a refractory period during which no firing can occur immediately after a spike is released, and*pulse generation*: the ability to generate new pulses.

Biological neurons consist of three primary structures: dendrites, soma, and axon.^{80,81} The neuron body, or soma, forms the thresholding function, accumulating input currents from dendritic trees until the internal voltage meets the condition for spike generation. Exact mechanisms for this spike generation vary in biological realism and complexity and can be reviewed in Refs. 107 and 108. Photonic implementation of this function can be generally classified between all-optical (or photonic) and electro-optic approaches.

**All-optical neurons** tend to use more simplified neuron models due to the difficulties in implementing optical nonlinearity. One of the simplest approaches uses traditional ANN activation functions, such as the sigmoid function, and maps this onto spiking hardware as in a rate-encoded ANN translation.^{109} Another common choice is the leaky-integrate-and-fire (LIF) model,^{110} in which an internal state variable—representing the membrane potential in biology—constantly decays exponentially toward some equilibrium value. Incoming spikes increment the membrane potential according to synaptic strength, and the neuron fires if the potential surpasses some threshold before decaying back to its resting potential. Choices of nonlinearity to implement these models include the use of semiconductor optical amplifiers (SOAs),^{109,110} vertical-cavity surface-emitting lasers (VCSELs),^{111–113} saturable absorption,^{43,114–117} and more recently passive micro-resonators.^{118} These optical neurons have not yet been demonstrated in large-scale networks, and more work is needed to establish their computational abilities.

**Optoelectronic neurons**, the alternative, combine the advantage of well-studied electronic nonlinearities alongside the fast, nearly lossless transmission^{119} and zero-energy weighting provided by the photonic devices discussed in Sec. II C 2. O/E/O conversion uses photodetectors to generate an electrical current in proportion to the power of light received, thus converting the aggregated optical inputs from the synaptic mesh into electrical currents. The electrical circuit, in turn, can use any nonlinear circuit element to implement the spiking function and generate an optical output using semiconductor lasers. Nozaki *et al.*^{120} demonstrated that close integration between the photodetector and the modulator reduced the integrated capacitance to 2 fF and that non-spiking neural nonlinearity can be achieved at an extremely low energy consumption of 4.8 fJ/bit at a speed of 10 Gbit/s. However, such a neuron architecture requires a constantly powered laser source. For instance, continuous currents supplying the lasers described in Ref. 120 consume a significant amount of energy even when the neurons are idle. These dynamics can be implemented by closely integrating CMOS transistors with photodiodes and electro-optic modulators. As discussed in Sec. II B 1, spiking neural networks offer far better overall energy efficiency due to the sparse nature of communication in event-driven neuromorphic computing. Recently, a low-power, 6-transistor soma design has been demonstrated for spiking optical neural networks.^{121} This soma design consumes 21.09 fJ/spike with a maximum spiking rate of 10 GHz on 90 nm CMOS.

Similar O/E/O circuit designs can potentially allow the use of more biologically accurate neuron models by replacing the transistor circuit with any desirable analog electrical neuron nonlinearity, such as that presented in the study by Farquhar and Hasler.^{122} Unlike the all-optical neurons, O/E/O conversion limits the response speed of a single optoelectronic neuron because of the analog bandwidth limitation of the electronics. However, throughput increases in neural networks benefit from optical parallelism in wavelength and spatial domains. Miscuglio *et al.*^{123} discussed that for CMOS-compatible integrated neuromorphic devices, >25 GHz is possible with a relatively low energy consumption of <10 fJ/b. Careful studies of benchmarking the system-wide throughput, energy consumption, and latency for given workloads are necessary to correctly compare neuromorphic computing systems among the various optical, electrical, or optoelectronic technologies.

### D. Learning

**Gradient descent algorithms**, such as backpropagation and its many variants, are often used to train ANNs and have recently been applied to PNNs. A cost or loss function is associated with the neural network that numerically penalizes differences between outputs of the ANN and their respective target values. The matrix elements that refer to the weights of the synaptic interconnections within the network are tuned to minimize the cost function over the input space based on computed gradients with respect to each weight. Backpropagation is a class of gradient descent algorithms that extends this procedure by sequentially applying the chain rule to calculate gradients from the output to the input layer. Recurrent layers within networks can apply backpropagation through time by unrolling the network activity into time steps. Various other forms of backpropagation can be found in the literature; see Ref. 124 for a recent review of these techniques in the context of deep neural networks.

Recent work^{125} used photonics and adjoint variable methods (AVMs) to derive a photonic implementation of backpropagation for training PNNs. AVMs utilize the field solutions of two electromagnetic simulations, referred to as the adjoint field and the original field, to derive the cost function of the photonic neural network. This cost function is more compact in terms of measurable quantities within the network and allows for the parallel gradient computation for all phase shifters within an MZI network. Gradient computation can then be completed using intensity measurements of the adjoint field, which physically corresponds to a backward propagating waveguide mode sent into the system through the output ports. Such a backpropagation implementation is robust under noise and scaling while allowing *in situ* computation for photonic neural networks.

**Hebbian learning**is a more bio-derived class of algorithms based on Hebb’s original postulate that, in general, synaptic connections are strengthened between neurons whose activity is correlated.

^{126}The exact weight update rule varies from one mathematical formulation to another, which is often based on the activity of neurons—such as the average firing rate of spiking neurons or the activation function of non-spiking neurons—or the timing differences between pre- and post-synaptic firing in explicitly spiking neuron models.

^{127}Two- and three-factor learning rules

^{128}are an example of the former set of algorithms; spike trains are smoothed by convolution with a kernel (such as the exponential decay filter) and used to numerically represent the activity of pre- and post-synaptic neurons; see Ref. 127 for a deeper review of the mathematical formulations of Hebbian learning. The latter subclass of Hebbian learning is called Spike-Timing-Dependent Plasticity (STDP), in which weight updates are calculated based on a nonlinear function of the time-interval, Δ

*T*, between pre- and post-synaptic spikes,

^{69}

Various photonic synapses have demonstrated STDP behavior by modulating the gain of SOAs,^{129,130} changing the transmission levels of phase-change memory materials,^{131,132} and by other photoelectric device mechanisms.^{131} STDP supports both supervised^{133} and unsupervised learning^{134} rules that have been demonstrated by simulation by Xiang and colleagues. In Ref. 133, a VCSEL-SA structure is simulated to implement a supervised XOR learning task with an STDP learning rule. VCSEL-SA is chosen because its inhibitory dynamics can be used as inhibitory weights as seen in Ref. 85. The XOR task was solved with supervised training, where the weight update from a photonic STDP learning rule was derived in Ref. 135, and the sign is changed when the output does not match the target value. Training was shown to converge completely after 40 simulated epochs. Xiang and colleagues also demonstrated an unsupervised pattern recognition task with similar components. In both works, a controller circuit calculates STDP weight updates to program the synaptic weights.

Amato *et al.*^{136} made a comparison of multiple hybrid networks composed of Hebbian-trained and gradient-descent-trained layers to isolate the respective advantages of each technique in the context of a classification task using feedforward convolutional neural networks (on a traditional computing platform). Their analysis empirically showed that Hebbian learning falls behind gradient descent for deep networks, with evidence to show that intermediary layers are not trained as efficiently as layers near the beginning or end of the network. It was also found that Hebbian learning requires far fewer epochs (iterations over the training set) than gradient descent, with convergence after 2 epochs vs 20 for Hebbian and gradient descent, respectively.

Hebbian learning rules have the advantage of requiring only the information that is local to a given synapse. For example, two-factor learning rules compute weight updates from traces (filtered spike trains), representing average pre- and post-synaptic activities over a given time scale. As such, it is conceivable to implement an architecture that calculates all weight updates entirely in parallel, as opposed to gradient descent algorithms, which must sequentially propagate the credit of error backward through the network layer by layer and thus fundamentally require some amount of serial computation. This advantage, in turn, also allows Hebbian learning to be easily applied in online learning tasks wherein learning is performed whenever new information is made available. Gradient descent algorithms, in contrast, are more often used with batch training where the error is minimized across multiple samples simultaneously. As such, Hebbian learning is advantageous in settings with real-time learning requirements, while gradient descent is more amenable to applications in which many novel input samples are generated nearly simultaneously. See Ref. 137 for more on the advantages and formulations for online learning in Hebbian and spiking neural networks.

### E. Review of existing approaches

Table II tabulates and compares the existing architectures for photonic neuromorphic processing designed by some well-known institutions and organizations in the field. This table lists the technologies used for synaptic meshes, neural nonlinearities, and any performance metrics reported if applicable. Synaptic meshes are discriminated based on the topology, learning mechanisms utilized for the network, network size demonstrated (experimental or simulated), and reconfigurable elements employed. Neuron models are distinguished by the functional form of nonlinearity employed and the use of optoelectronic or fully photonic devices. Finally, the performance of each architecture is reported as presented, given the lack of a single established performance metric agreed upon within the field (discussed more in Sec. IV). As such, this table aims to summarize and contextualize existing approaches to photonic neuromorphic computing.

. | Syn. . | Syn. . | Neural . | Demonstrated . | . | . | . | . | . |
---|---|---|---|---|---|---|---|---|---|

. | Research . | network . | reconfiguration . | network . | network . | . | Reported . | . | Neurons/ . |

group . | architecture . | mechanism . | topology . | size . | Learning . | performance . | Multiplexing . | nonlinearity . | References . |

UC Davis | Rectangular and triangular MZI meshes | Thermo-optic (SOI) | Feed-forward TNN | 1024 × 1024 (sim) | EC standard BP and TT decomposition | MAC/J | WDM and SDM | Optoelectronic Izhikevich spiking neuron | 121,138,139 |

Princeton | MRR banks | Thermo-optic (SOI with Ti/Q heaters) | Feed-forward | 2 × 3 w/0–250 broadcast nodes (sim/exp) | NR | Extinction ratio > 13 dB | WDM | Optoelectronic leaky integrate-and-fire spiking neuron | 100,106,140 |

University of Monarch | Rectangular MZI mesh | Electro-optical phase shifter | Convolutional (3 × 3) | 900 × 10 (sim/exp) | EC standard BP | 11.321 TOPS | WDM | EC sigmoid non-spiking neuron | 58,64 |

Pleros Aristotle University of Thessaloniki | Rectangular MZI mesh | Electro-optical (SOA) | Recurrent | 32 × 3 (sim), 4 × 4 (exp) | EC standard BP | SOA1 = 180 pJ/symbol and SOA2 = 300 pJ/symbol | WDM | SOA-based sigmoid non-spiking nonlinearity | ^{141–143} |

Stanford | Rectangular MZI mesh | Electro-optical phase shifter | Feedforward (FT pre-processed) | 16 × 10 (sim) | EC custom BP | 94% classification accuracy (MNIST) 7.7 × 10^{12} MAC/s | WDM | Custom optoelectronic non-spiking nonlinearity | ^{144–146} |

McGill University | Diamond MZI mesh | Electro-optical phase shifter | Feedforward | 4 × 4 mesh (sim/exp) | EC custom algorithm | 98.9% accuracy (0 dB MZI loss) and 75% accuracy (0.5 dB MZI loss) | WDM | NA | ^{147–149}, 104 |

MIT | Diamond MZI mesh | Thermo-optic (SiPh + PCM) | Feedforward | 4 × 4 mesh (sim) | EC standard BP | ∼100 pJ/FLOP | TDM | NA (sug. bistable nonlinear photonic crystals) | 150 |

Ghent University | Crossbar network | Electro-optical (PCM) | Passive reservoir computing | 4 × 4 (sim) | EC complex-valued ridge regression | Minimum error rate = 10^{−3} | TDM | Photodetector non-spiking nonlinearity | ^{151–156} |

NTT | Crossbar network | Electro-optical (cross-gain modulated SOA) | Reservoir computing (RNN) | Single device | NR | 43 mW consumed, normalized mean square error (NMSE) ∼ 0.112 | TDM | Custom SOA non-spiking nonlinearity | 74 |

NIST | Rectangular MZI mesh | Electro-optical (superconducting-nanowire single photon detectors) | Spiking feedforward | 49 SNSPDs (exp) | NR | NR | TDM | Optoelectronic integrate-and-fire spiking neuron | ^{157–160} |

University of Washington | Crossbar network | All-optical (PCM [GST]) | Convolutional (2 × 2) | 256 × 256 input (sim/exp) | EC standard BP | 25 TOPS/mm^{2} | WDM | NA | 78 and ^{161–163} |

George Washington University | Rectangular MZI mesh | Thermo-optic (PCM [GSST-Si]) | Feedforward (FF); convolutional | FF: 784 × 100 × 10 (simulation); CNN: NR | EC standard BP | 93% inference accuracy (MNIST) | WDM | Custom sigmoidal non-spiking nonlinearity | 77,164 |

University of Paris-Saclay | MRR banks (arranged in crossbar-like topology) | All-optical (SOI) | Swirl reservoir computing | 4 × 4 (simulation) | EC ridge regression | XOR task 20 Gb/s with BER < 10^{−3} and injection power < 2.5 mW | WDM | Custom MRR non-spiking nonlinearity | ^{83,165,166} |

. | Syn. . | Syn. . | Neural . | Demonstrated . | . | . | . | . | . |
---|---|---|---|---|---|---|---|---|---|

. | Research . | network . | reconfiguration . | network . | network . | . | Reported . | . | Neurons/ . |

group . | architecture . | mechanism . | topology . | size . | Learning . | performance . | Multiplexing . | nonlinearity . | References . |

UC Davis | Rectangular and triangular MZI meshes | Thermo-optic (SOI) | Feed-forward TNN | 1024 × 1024 (sim) | EC standard BP and TT decomposition | MAC/J | WDM and SDM | Optoelectronic Izhikevich spiking neuron | 121,138,139 |

Princeton | MRR banks | Thermo-optic (SOI with Ti/Q heaters) | Feed-forward | 2 × 3 w/0–250 broadcast nodes (sim/exp) | NR | Extinction ratio > 13 dB | WDM | Optoelectronic leaky integrate-and-fire spiking neuron | 100,106,140 |

University of Monarch | Rectangular MZI mesh | Electro-optical phase shifter | Convolutional (3 × 3) | 900 × 10 (sim/exp) | EC standard BP | 11.321 TOPS | WDM | EC sigmoid non-spiking neuron | 58,64 |

Pleros Aristotle University of Thessaloniki | Rectangular MZI mesh | Electro-optical (SOA) | Recurrent | 32 × 3 (sim), 4 × 4 (exp) | EC standard BP | SOA1 = 180 pJ/symbol and SOA2 = 300 pJ/symbol | WDM | SOA-based sigmoid non-spiking nonlinearity | ^{141–143} |

Stanford | Rectangular MZI mesh | Electro-optical phase shifter | Feedforward (FT pre-processed) | 16 × 10 (sim) | EC custom BP | 94% classification accuracy (MNIST) 7.7 × 10^{12} MAC/s | WDM | Custom optoelectronic non-spiking nonlinearity | ^{144–146} |

McGill University | Diamond MZI mesh | Electro-optical phase shifter | Feedforward | 4 × 4 mesh (sim/exp) | EC custom algorithm | 98.9% accuracy (0 dB MZI loss) and 75% accuracy (0.5 dB MZI loss) | WDM | NA | ^{147–149}, 104 |

MIT | Diamond MZI mesh | Thermo-optic (SiPh + PCM) | Feedforward | 4 × 4 mesh (sim) | EC standard BP | ∼100 pJ/FLOP | TDM | NA (sug. bistable nonlinear photonic crystals) | 150 |

Ghent University | Crossbar network | Electro-optical (PCM) | Passive reservoir computing | 4 × 4 (sim) | EC complex-valued ridge regression | Minimum error rate = 10^{−3} | TDM | Photodetector non-spiking nonlinearity | ^{151–156} |

NTT | Crossbar network | Electro-optical (cross-gain modulated SOA) | Reservoir computing (RNN) | Single device | NR | 43 mW consumed, normalized mean square error (NMSE) ∼ 0.112 | TDM | Custom SOA non-spiking nonlinearity | 74 |

NIST | Rectangular MZI mesh | Electro-optical (superconducting-nanowire single photon detectors) | Spiking feedforward | 49 SNSPDs (exp) | NR | NR | TDM | Optoelectronic integrate-and-fire spiking neuron | ^{157–160} |

University of Washington | Crossbar network | All-optical (PCM [GST]) | Convolutional (2 × 2) | 256 × 256 input (sim/exp) | EC standard BP | 25 TOPS/mm^{2} | WDM | NA | 78 and ^{161–163} |

George Washington University | Rectangular MZI mesh | Thermo-optic (PCM [GSST-Si]) | Feedforward (FF); convolutional | FF: 784 × 100 × 10 (simulation); CNN: NR | EC standard BP | 93% inference accuracy (MNIST) | WDM | Custom sigmoidal non-spiking nonlinearity | 77,164 |

University of Paris-Saclay | MRR banks (arranged in crossbar-like topology) | All-optical (SOI) | Swirl reservoir computing | 4 × 4 (simulation) | EC ridge regression | XOR task 20 Gb/s with BER < 10^{−3} and injection power < 2.5 mW | WDM | Custom MRR non-spiking nonlinearity | ^{83,165,166} |

^{a}

NA: not applicable, NR: not reported, EC: externally computed, BP: backpropagation, FOM: figure of merit, Sim: simulated, exp.: experimentally demonstrated, SOI: silicon on insulator, SiPh: silicon photonics, SOA: semiconductor optical amplifier, PCM: phase change materials, GST: GeSbTe, and GSST: GeSeSbTe.

## III. TOWARD SCALABLE PHOTONIC AND OPTOELECTRONIC NEUROMORPHIC COMPUTING

One of the major remaining challenges of both electronic and photonic neuromorphic computing is the physical composition of large-scale neural networks. The photonic (and general) neuromorphic technologies and methods discussed thus far have addressed scalability in other forms by reducing algorithmic complexity,^{52,53} by increasing parallelism through multiplexing (WDM,^{60,65,77} TDM,^{78} and SDM^{58}), and in the general sense of decreased energy consumption of photonic components.^{47,58,121} Nonetheless, as the availability of data increases and the demand for computing resources rises to match, so will the demand for large-scale neural networks with high neural and synaptic densities. In such cases, the number of distinguishable modes or wavelength channels in photonic networks may become a barrier to further increases in parallelism. As such, other methods must be developed to increase neural and synaptic densities to match the needs for large-scale networks in reasonably sized form factors.

The remainder of this section introduces and discusses two promising photonic technologies under active research that may improve the physical scalability of future integrated photonic and optoelectronic neural network architectures. First, a recently developed technique, called tensor train decomposition,^{167} simplifies the structure of the neural network into only the fundamental elements (called “tensor cores”) required for fast, accurate training. The second utilizes new techniques in fabrication^{168,169} to reorganize the floor plan of fabricated circuits to utilize the third dimension more efficiently, thus realizing a 3D neuromorphic system much like the biological brain. Future work is needed to apply these algorithmic and manufacturing innovations to novel photonic neuromorphic computers.

### A. Tensor-train decomposition

^{167}is a decomposition algorithm that works by representing the elements of a

*d*-dimensional tensor as the product of

*d*three-dimensional tensor core elements as in

*r*

_{k−1}×

*r*

_{k}matrix. The product results in an

*r*

_{0}×

*r*

_{d}matrix; hence, the condition that

*r*

_{0}=

*r*

_{d}= 1 is imposed to match the input and output dimensionality. This procedure of representing a rank differs from the canonical representation of a tensor rank in that the ranks

*r*

_{k}can be calculated as the ranks of known auxiliary matrices. The matrix tensor cores $Gkik$ from Eq. (3.1) are three-dimensional arrays and can be written as $Gk\alpha k\u22121,ik,\alpha k$, which can be treated as

*r*

_{k−1}×

*n*

_{k}×

*r*

_{k}arrays with $Gk\alpha k\u22121,ik,\alpha k=Gkik\alpha k\u22121\alpha k$.

^{167}

Shallow networks with large fully connected layers achieve almost the same accuracy as an ensemble of CNNs.^{170} Therefore, it is highly desirable to implement high-radix (e.g., 1024 × 1024) photonic synaptic interconnections. However, an *N* × *N* MZI mesh representing a unitary weight matrix requires a minimum of *N*(*N* − 1) reconfigurable elements and *N* cascaded stages,^{103} limiting mesh scalability for high-radix interconnections. Tensor-train decomposition offers increased scalability to PNNs by reducing the number of elements in the network (i.e., fewer MZI units). High-radix meshes are formed by cascading smaller-radix meshed called photonic tensor-train cores.^{138} Other benefits include the resulting reduction in optical insertion loss and decreased chip size for a given network.

Tensorized PNNs were proposed in Ref. 138 for deep feed-forward neural networks with a rectangular MZI mesh. In principle, however, the architecture can be applied to spiking neural networks and recurrent neural networks. The proposed network utilizes a discrete-time representation of input signals where each input sample is amplitude modulated on a continuous wave optical carrier at specific time intervals. A diagram of the architecture for a conventional PNN and a tensorized PNN is shown in Fig. 9, detailing the difference in the architecture that arises from utilizing tensor-train decomposition in the training process. By adding parallelism in both the wavelength domain using WDM technology and space domain using 3D photonics, the proposed tensorized PNNs maintain all the benefits of the conventional PNNs while reducing the insertion loss by 171.8 dB and the number of MZIs by a factor 582×.

A simulation demonstration has been completed by considering hardware implementation challenges, such as phase-shifter variations and beam splitter power imbalances.^{171} The simulation studies the response related to cross-entropy loss to benchmark the tensorized PNNs compared to conventional PNNs and Fourier-transform preprocessed PNNs. In particular, the accuracy of the different network models was studied for performing handwritten digit recognition with the MNIST dataset. The simulation results demonstrate that the TNNs are robust against phase-shifter variations and beam splitter power imbalances when studying the overall accuracy of the trained TNN model against the MNIST dataset. Furthermore, the implementation of the photonic TNN can achieve >90% classification accuracy while using 33.6× fewer MZIs than a conventional ANN, which can only achieve 71.6% accuracy under the practical hardware imprecisions studied. Further implications of the architecture for scalability, accuracy, learning, and hardware implementation are under active study.^{138,139,171}

### B. 3D electronic and photonic integration

3D integration is essential for practical photonic neuromorphic computing since typical photonic devices are of many wavelengths in size. Together with control electronics, 3D electronic photonic integrated circuits (3D EPICs) must be considered. At the heart of the 3D EPIC is a through-silicon-optical-via (TSOV) with silicon photonic vertical reflectors. Recently, a UC Davis team has experimentally demonstrated a 90° vertical coupler as illustrated in Fig. 10, which consists of a silicon photonic vertical via and a 45° reflector attached to a waveguide end.^{168,169} The interlayer connection loss is 1.3 dB (or 0.65 dB per coupling)^{169,172} and is limited by the mode matching of the lateral and vertical waveguides for the given 220 nm silicon thickness rib waveguides.^{169,172} For thicker silicon rib waveguides (e.g., 500 nm thick), the loss can be reduced to 0.5 dB per interlayer connection. This vertical coupler can also be used for interlayer coupling in a multi-layer silicon photonic 3D integrated circuit by placing a matching vertical coupler face-to-face. For coupling between the silicon photonic waveguide layer and a silicon nitride layer, inverse tapered couplers can be utilized where UC Davis and other groups have already demonstrated interlayer coupling loss at ∼0.01 dB.^{173,174}

## IV. BENCHMARKING METRICS

Successfully comparing any two things requires a system that can meaningfully establish their value. At the lowest level in the field of computing, the value is placed on computational ability as measured by such things as latency, throughput, accuracy, and energy efficiency. A complication arises when any of these metrics may change in the context of a specific application. This challenge is already present in the case of conventional general-purpose computing architectures, resulting in the availability of competing standards for benchmarking CPUs.^{175,176} This challenge is further exacerbated by the more open-ended and varied goals of neuromorphic computing architectures, the ambiguity of what a biological brain “does” and consequently what a brain-like or brain-inspired architecture should therefore also “do.” In traditional computing, different architectures may emphasize the optimization of the aforementioned values for different computational units based on the needs of the design—integer operations vs floating-point operations, for example. In the neuromorphic case, these operations may take the form of individual neural state updates, processing of spike traffic, or synaptic weight updates, each of which may be broken down into further suboperations. In summary, a good choice of benchmark addresses the following two questions: (1) How to establish the value? (2) How to establish fairness? The remainder of this section will describe commonly referenced metrics of comparison in photonic neuromorphic devices before describing a more general approach for benchmarking that can be applied to neuromorphic computers of various electronic and photonic architectures.

Multiple photonic devices of interest^{58,60,121,177} have reported their value in terms of the energy efficiency and throughput of multiply–accumulate (MAC) operations. Since MAC operations require many parallel memory accesses for large networks, they tend to contribute to the bottleneck for network-based computation on von Neumann architectures. While it is undoubtedly true that the MAC operation is a significant component of network computation, a singular focus on the performance of this operation would be far too myopic as the behavior of biological networks and real-time interactions may drastically affect performance. For example, the performance of the 11 tera-operations per second (TOPS)^{58} photonic convolutional accelerator is only available to workloads in which there is a need to perform so many operations in a given amount of time. In contrast, the real-time applications—that biological brains are most suited for—may not produce enough data for these processing speeds to be relevant.

Some have attempted to account for the device footprint in relation to the improvements in MAC throughput and energy.^{177} While this approach more fairly compares the performance of matrix multipliers across photonic and electronic platforms, it does not address the other factors involved in neural network processing, which many contemporary photonic devices offload to post-processing on traditional computers. Additionally, the SNN architecture—which is considered favorable to many in the context of neuromorphics—favors an accumulate and fire operation in which the membrane potential is a continuous state variable that undergoes continuous update in the ideal (analog) case, at which point the definition of a single MAC operation may not be clear.

Furthermore, Cole^{178} suggested that when programmability and data transfer are considered, the energy consumption of computational elements is negligible to both electronic and photonic approaches. Cole suggested that when considering the energy consumption of optical receivers, there is no advantage to photonic computation over a fully optimized electronic computer. Instead, Cole claimed that energy reduction efforts should be focused on the adoption of optical data transfer and not optical computation. It is important to note that the computation considered is binary and that representation in neuromorphic computation will not necessarily take this form. Nonetheless, this result demonstrates the importance of comparing neuromorphic processors in their entirety rather than considering the consumption of particular computational elements.

More mature neural network processors—for example, the electronic neuromorphic devices TrueNorth,^{13} Loihi,^{14} and Neurogrid^{11}—have instead reported their achievements in terms of energy consumption per spike or energy per bit. Such metrics can neglect the question of fairness, as changes in workload or architecture can drastically change these metrics. When reporting the energy per spiking event, for example, it is unclear whether the operations contributing to the membrane potential updates should be considered. For a different workload, the number of events before the neuron reaches threshold and fires may vary and result in an inconsistent metric. If subthreshold operations are included, a digital architecture with discrete timesteps might require more energy per spike for workloads with longer gaps between spikes. If subthreshold operations are not included, then an architecture that chooses the smallest possible spiking energy may appear more efficient despite another architecture that might compensate with nearly passive energy costs for incoming spike accumulation. In such an architecture, spike energy may be less important; after all, sparsity in time is a major advantage of spiking networks. Furthermore, metrics involving units of bits are specific to a given architecture in that architectural choices determine what role these bits play and whether the bit width is flexible. thus making it more difficult to establish fair comparisons between widely different architectures. It has even been argued that bit precision is not significant in neuromorphic computing, given that one of the goals of the field is to perform computation with low-precision elements.^{11}

Proper benchmarking of neuromorphic computers should take inspiration from solutions generated in traditional computing; various standards of benchmarking have been proposed, such as SPEC^{175} or MLPerf,^{176} which attempt to fairly discriminate the advantages and disadvantages of various architectures in different contexts. Mike Davies, Director of Intel’s Neuromorphic Computing Lab, has suggested that the field of neuromorphics has not yet matured enough to establish the specific operations that a fully qualified neuromorphic computer should support yet proposes a benchmarking suite known as SpikeMark.^{50} In this suite, various workloads, such as spoken keyword classification or hand gesture recognition, would be used to determine an architecture’s feature set and flexibility in various contexts while providing a standard for comparing energy efficiency and performance. As the name implies, SpikeMark focuses on spiking network workloads though it is important to note that spiking behavior may not be necessary for useful neuromorphic devices. In the book *How to Build a Brain*,^{179} Eliasmith describes a set of “Core Cognitive Criteria” that attempts to answer the question of what a brain “does” and can also act as a framework for designing neuromorphic benchmarks. The criteria are broken down into three categories—representational structure, performance concerns, and scientific merit—which are ambiguous to the choice of spiking or non-spiking neuron model and have been applied to the design of a large-scale network model known as SPAUN.^{180}

Further work is still needed to resolve the ambiguity of what workloads should be considered fundamental to a neuromorphic processor and establish an official benchmarking standard. Regardless, the device footprint, energy efficiency, and relevant processing speeds should be considered jointly across multiple minimally overlapping tasks representing the desirable computational characteristics that researchers seek to borrow from biological brains.

## V. SUMMARY

Recent advances in photonic circuits have led to theoretical studies and experimental demonstrations of synaptic interconnects with reconfigurable photonic elements capable of arbitrary linear matrix operations—including MAC operation and convolution—at extremely high speed and energy efficiency. Both all-optical and optoelectronic neurons of nonlinear transfer functions have also been investigated. A number of research efforts have reported orders of magnitude improvements estimated for computational throughput and energy efficiency. While the photonic technologies are relatively immature compared to their electronic counterparts, silicon photonics have emerged as a viable platform for integrated photonic neuromorphic circuits. However, substantial challenges remain in several areas: (a) cross-layer co-design of algorithms, architectures, circuits (photonic and electronic), training/inference, and benchmarking; (b) heterogeneous integration of dissimilar materials if non-volatile synaptic reconfigurabilities are to be incorporated; (c) realizing high-density 3D integration; and (d) implementing scalable learning and inference in the resulting system. With rapid and accelerating trends of active progress in this nascent area of research and development, we expect to see key advances in addressing each of the above challenges for viable photonic and optoelectronic neuromorphic computing in the future.

## ACKNOWLEDGMENTS

The authors acknowledge enlightening discussions with Professor David A. B. Miller. This work was supported, in part, by AFOSR under Grant No. FA9550-18-1-0186.

## AUTHOR DECLARATIONS

### Conflict of Interest

The authors have no conflicts to disclose.

## DATA AVAILABILITY

The data that support the findings of this study are available from the corresponding authors upon reasonable request.

## REFERENCES

*NANO-CHIPS 2030*

*Neuromorphic Photonics*

*Comparative Study of the Sensory Areas of the Human Cortex*

_{2}glass

*Thermo-optic coefficients, Handbook of Optical Constants of Solids: Handbook of Thermo-Optic Coefficients of Optical Materials with Applications*

*Spiking Neuron Models. Single Neurons, Populations, Plasticity*

*in situ*backpropagation and gradient measurement