Recent advances in neuromorphic computing have established a computational framework that removes the processor-memory bottleneck evident in traditional von Neumann computing. Moreover, contemporary photonic circuits have addressed the limitations of electrical computational platforms to offer energy-efficient and parallel interconnects independently of the distance. When employed as synaptic interconnects with reconfigurable photonic elements, they can offer an analog platform capable of arbitrary linear matrix operations, including multiply–accumulate operation and convolution at extremely high speed and energy efficiency. Both all-optical and optoelectronic nonlinear transfer functions have been investigated for realizing neurons with photonic signals. A number of research efforts have reported orders of magnitude improvements estimated for computational throughput and energy efficiency. Compared to biological neural systems, achieving high scalability and density is challenging for such photonic neuromorphic systems. Recently developed tensor-train-decomposition methods and three-dimensional photonic integration technologies can potentially address both algorithmic and architectural scalability. This tutorial covers architectures, technologies, learning algorithms, and benchmarking for photonic and optoelectronic neuromorphic computers.
I. INTRODUCTION
Artificial Intelligence (AI) and Machine Learning (ML) have transformed our everyday lives—everything from scientific computing to shopping and entertainment. The intelligence of such artificial systems lies primarily in data centers or warehouse-scale computing systems and has been shown to surpass the ability of human brains in some tasks, including the highly complex game of Go. However, today’s data centers consume megawatts of power [Google’s AlphaGo utilized 1202 central processing units (CPUs) and 176 graphical processing units (GPUs)1], and current deep neural network algorithms require labor-intensive hand labeling of large datasets. Furthermore, the early conceptualization of “a-machine” by Turing in 19362 (also called the Turing machine) proved the existence of fundamental limitations on the power of mechanical computation, albeit with powerful mathematical models and algorithms using a processing unit (e.g., CPU) as is done today. In addition, modern computers utilize random-access memory instead of an infinite memory tape divided into discrete cells.
In his “First draft of a report on the EDVAC,” in 1945,3 John von Neumann articulated what is considered the first general-purpose computing architecture based on memory, processing units, and networks (interconnects). Fascinatingly, von Neumann utilized synapses, neurons, and neural networks in this 1945 report to explain his proposed architecture and then predicted its limitations—now called the von Neumann bottleneck3—by stating that “the main bottleneck of an automatic very high-speed computing device lies: At the memory.” Because of this limitation, relatively simple tasks, such as learning and pattern recognition, require a large amount of data movement (including moving the weight values) between the processor and the memory (across the bottleneck). Thus, the energy efficiency and the throughput of such computing tasks are fundamentally limited in von Neumann computing as was already predicted in 1945.4
Despite increases in computing speed and the development of memory hierarchies, a fundamental separation between memory and computation remains, limiting data processing speeds regardless of the total availability of memory resources. Neuromorphic computers, in contrast, perform computation through directed graphs that are much better suited for the collocation of computing units and memory. Such a model has persistent or non-volatile memory in the form of synaptic weights uniquely associated with each pair of nodes in the graph. This locality of information allows neuromorphic architectures to avoid the bottleneck between processing and memory entirely. Each node is an individual computing unit with its own dedicated memory such that multiple pieces of information can be processed completely asynchronously and in parallel much like the human brain.
A human brain recognizes features from partial and conflicting information at ∼20 W power levels.5 At each moment, the brain is bombarded with a vast amount of sensory information, but somehow, the brain makes sense of this data stream, even if it contains imperfect and inconsistent data elements, by extracting the forms of the spatiotemporal structure embedded in it. From this, it builds meaningful representations of objects, sounds, surface textures, and so forth through parallel distributed processing. In a human brain, each neuron may be connected to up to ∼10 000 other neurons, passing signals via as many as 164 × 1012 synaptic connections,6 equivalent by some estimates to a computer with a 1 × 1012 bit per second processor. The neurons communicate with each other with extremely high energy efficiency. For example, in Ref. 7, Attwell and Laughlin observed that the energetic cost of information transmission through synapses is extremely efficient at ∼20 500 ATP/bit, corresponding to 1.04 fJ/bit at 32 bit/s. In the nervous system, firing a spike costs a neuron 106–107 ATP/bit7 or 50–500 fJ/bit, an amount of energy proportional to how far the spike must travel down the axon because the transmission medium is dispersive and lossy. Furthermore, dendrites of neurons contribute immensely to the energy efficiency and the density of computing in the brain by providing nano-/micro-scale neural networks inside the neuron itself, which is part of a larger neural network. The massively parallel yet hierarchical nature of learning and inference processes in the brain has been intriguing but not fully understood.
Is it possible to bring such brain-inspired capabilities to artificial machines with similar energy efficiency and scalability? Can we replicate the brain’s remarkable capabilities by constructing synapses and neurons using artificial materials and devices? There have been decades of efforts in this area of neuromorphic computing, and none have come close to demonstrating the full capability of the brain. Turing in 1950 proposed a test (now known as a Turing test) to replace the question “Can machines think?”8,9 Despite decades of efforts by many, even if a machine could get close to passing the Turing test, it is extremely unlikely or at least challenging to achieve the energy efficiency and the computing capacity in such small volume and weight as the brain. The seminal work carried out by Mead at Caltech in the late 1980s10 emphasized a million-fold improvement in power efficiency. The subsequent work of Boahen’s Neurogrid,11 Heidelberg’s BrainScaleS,12 IBM’s TrueNorth,13 Intel’s Loihi,14 Manchester’s SpiNNaker machine,15 Cauwenberghs’ Hierarchical Address Event Representation (HiAER) communications fabric,16 and Mitra and Wong’s N3XT17 all addressed far better energy-efficiency than conventional von Neumann computing for relatively simple example tasks.
There are challenges in scaling these electronic neuromorphic computing platforms to the very large scale. Electronic solutions typically include long electrical wires with large capacitance values, leading to high interconnect energy consumption. Their interconnect topologies are typically in four directions and require many repeaters for multi-hop connections to many other non-neighboring nodes. For instance, the TrueNorth chip runs at a slow clock speed of 1 kHz, communicates with energy efficiency at 2.3 pJ/bit with an additional 3 pJ/bit for every cm of transmission, and requires a 256 × 256 cross-bar network that selectively connects incoming neural spike events to outgoing neurons.13 The recently emerging nanoelectronic neuromorphic computing systems also suffer from similar communication challenges in achieving appreciable repeaterless distances, especially at high speeds.18 In the 1980s, optical neural network studies became a very active area of study in achieving massively parallel brain-like computing at the speed of light.19–24 However, the pioneer himself, Psaltis, declared in the 1990s that he was abandoning optical neuromorphic computing for two reasons: (1) lack of practical devices that can be integrated and (2) insufficient knowledge of complex neural networks. Fast forward to 2021, three decades later, we are now witnessing three major changes countering the two reasons for the abandonment. First, machine learning algorithms utilizing deep neural networks have advanced so much that an artificial machine with a single night of training can beat the human world champion of 33 years in the game of Go1—Lee Sedol referred to AI as “an entity that cannot be defeated.” Second, the rate of increase in component integration in silicon photonics25–27 is now twice as fast as that of the electronic integration (electronic Moore’s law28). Thus, we now find silicon photonic integrated circuits with ten thousand photonic components on a die manufactured on 300-mm silicon photonic wafers from several foundries. Third, while Moore’s law barely maintains its trend of continuing increases in transistor density—going from 5, 4, 3, and possibly down to 2 nm and below—the slowing of this trend is evident, and Dennard’s law,29,30 which governs energy efficiency, has already stalled since 2005. Hence, electronics alone cannot sustain the exponential increases of data processing, especially with the requirement that von Neumann computing architectures move data across a bottleneck. The natural conclusion from these three major changes points to photonic neuromorphic computing as the key solution to future computing.
Nonetheless, there are significant challenges in scaling analog photonic networks while maintaining high accuracy. Wetzstein et al.31 summarized this to be due to the following three main reasons: (1) the advantages (power and speed) of analog accelerators are useful only for very large networks, (2) the technology for the optoelectronic implementation of the nonlinear activation function was immature, and (3) the difficulty in controlling analog weights made it difficult to reliably control large optical networks. Hybrid or heterogeneously integrated photonic–electronic neural networks offer practical solutions to these challenges.
Many other reviews have been written on the topic of photonic neuromorphic computing systems. Some place heavier emphasis on a specific aspect of the system such as material choice32,33 or nonlinearity34 specific structures such as reservoir computers,35,36 or the relationship between photonic neuromorphic neural networks and machine learning,37 deep learning,38 or artificial intelligence.39 Others more broadly discuss interconnect technology, network topology, neuron design, and algorithm choices at differing levels of depth.40–44 This tutorial aims to concisely and comprehensively unify each of the aforementioned aspects of photonic neuromorphic design and cover them at their most fundamental level before describing how they relate to the computational abilities of the system; references to other reviews will be given for implementation and other details not fully addressed here. The tutorial is structured as follows: Sec. II A argues the rationale of photonic and optoelectronic neuromorphic computing. Section II B covers the general system architecture of the neuromorphic computer. Section II C details the individual building blocks, followed by the learning models in Sec. II D. Section III addresses the critical topic of achieving scalability in algorithm and physical systems. Section IV surveys benchmarking and casts the challenges in benchmarking such a nascent area of computing. Finally, Sec. V summarizes the tutorial by addressing future directions.
II. TOWARD REALIZATION OF OPTOELECTRONIC AND PHOTONIC NEUROMORPHIC COMPUTING
A. Rationale for optoelectronic and photonic neuromorphic computing
It is not a coincidence that the biological system chose to utilize spiking neural networks (SNNs) instead of non-spiking ones after millions of years of evolution. The energy efficiency of the brain is crucial; although a human brain represents only ∼2% of the body weight, it consumes ∼20% of the oxygen and calories. Information transfer and processing utilizing spikes—based on event-driven communication and processing in a massively parallel system—are orders of magnitude more energy-efficient than non-spiking counterparts that require constant energy consumption even when communication or computation is unnecessary.
Another important challenge is the implementation of high-throughput and scalable neuromorphic computers that maintain this energy efficiency. Even for bio-derived neuromorphic computing, it is difficult to exactly replicate the wet-electro-chemical systems of ion channels and ATP/ADP conversions. Commonly used electrical wires are too power-hungry and noisy due to electromagnetic impedance, electromagnetic interference, and Johnson thermal noise. For instance, IBM’s TrueNorth system13 included repeaters consuming 3 pJ/bit for every cm to overcome dispersion limitations and assure signal integrity. Similarly, Intel’s first iteration of the Loihi chip is organized into a grid of 128 neurocores communicating through a network-on-chip (NoC) with an energy cost of 3–4 pJ for each hop.14
Even more serious energy limitations arise when large-scale synaptic networks need to be deployed utilizing electrical mesh networks whose capacitance and energy consumption scale quadratically. On the other hand, photonics do not suffer from the same types of limitations due to impedance, interference, thermal noise, and RC latency. Photonic meshes can achieve matrix multiplications in Figs. 2(b) and 2(c) just by propagating light through the photonic meshes, which can be made lossless in a unitary photonic mesh configuration. Photonic interconnects can achieve low energy (∼1 fJ/b),47 low loss (<0.1 dB/cm),48,49 parallel wavelengths (many wavelengths), and high speed (>10 Gb/s) transmission independently of the distance.
Nonetheless, it is difficult to fully compare photonic and electronic approaches without the availability of equivalently functional systems. However, the same challenge has been previously identified for comparisons between two or more fully electrical solutions.50 Nonetheless, this tutorial hopes to convince the reader that photonic approaches to neuromorphic computing can offer the following unique advantages compared to electronic counterparts:
Massive photonic parallelism achievable in wavelength, time, and space domains,
absence of electromagnetic interference,
extremely low noise (negligible thermal Johnson noise and possible shot-noise performance),
information transmission at 1 fJ/b independently of the distance,
matrix-multiplication achievable by simple propagation of light through the photonic mesh, in principle, with zero energy loss,
bidirectional photonic synapses and meshes achievable for forward/backward propagation training,
extremely low noise (shot-noise limited),
fast optical barrier synchronization, and
sparse processing overcoming the poor locality of data and information far beyond the reach of electronic neural networks.
On the other hand, photonic neuromorphic computing faces the following challenges:
Photonic components are relatively large (typical dimensions of wavelength/refractive index on the order of ∼1 μm) compared to electronic components (∼10 nm). Therefore, wavelength- and time-domain multiplexing is necessary to achieve the density comparable to electronic and biologic neural systems.
All-optical nonlinear transfer functions are difficult to achieve without relatively high optical power. Due to this reason, optoelectronic neural networks incorporating optical–electrical–optical (O/E/O) conversion have been proposed and demonstrated.51
Photonic neuromorphic computing almost always needs electronics for power distribution, control, and signaling. Section II B details the high-level architectural decisions of designing photonic neuromorphic computers and detail existing devices and methods for building optoelectronic and photonic neuromorphic hardware while highlighting these advantages and addressing the challenges.
B. System architectures for photonic neural networks
1. Spiking vs non-spiking photonic neural networks
As previously discussed, event-driven spiking neural networks are far more energy-efficient compared to non-spiking counterparts. Additionally, spiking units offer new ways to represent information that may be more natural for specific classes of computation, such as graph algorithms,52 quadratic programming,53 and other parallelizable computation algorithms.54 However, spiking networks have additional complexity in that they require mechanisms for integration over time. Neurons must integrate their inputs over a time window some order of magnitude greater than the received pulse widths to meaningfully process new aggregate information from their upstream neurons; otherwise, the activity of deeper layers in the network can merely encode a rough thresholding of activity from previous layers. Additional complexity in the inclusion of dendritic delays—which simulates the effect of spatially distributed networks—allows downstream neurons to encode and process information about the timing patterns of their upstream inputs and thus provides another dimension of processing to the network.
On the other hand, mathematical and experimental implementations of non-spiking artificial neural networks (ANNs) are much more accessible than those of spiking neural networks. Moreover, the choice of a non-spiking model can more easily leverage the many developments of traditional deep learning from traditional computing platforms and more naturally apply gradient-based learning rules (discussed more in Sec. II D) that have been well proven in many application spaces. Many non-spiking photonic matrix multipliers55–61 and photonic neural networks (PNNs)62–65 have been proposed or demonstrated. See Ref. 66 for a taxonomy of photonic neural network approaches and a more in-depth review of existing approaches.
2. Out-of-plane vs in-plane photonic neural networks
Out-of-Plane Photonic Neural Networks. As introduced in Sec. I, the first optical neural networks were developed in the 1980s incorporating optical planes (pixels) with photonic signals propagating vertically. Such out-of-plane implementations of photonic neural networks are still in active research (a) because it can attempt to utilize many optically resolvable elements (pixels) simultaneously in parallel and (b) because Fourier optics and optical convolution can be achieved easily by incorporating lenses. The electronic architecture demonstrated in 1985 has been commercialized in the Optalysys system, where a PC has been used to provide the necessary gain, feedback, and thresholding indicated in Ref. 67. Interestingly, the optical feedback scheme utilizes O/E/O conversion consisting of arrays of photodetectors (PD) and light-emitting diodes (LEDs). Both schemes utilize electronic amplification to overcome optical diffractive losses and electronically incorporate the nonlinear transfer function (no all-optical neural transfer function).
More recently, out-of-plane photonic neural networks have been implemented in the form of Diffractive Deep Neural Networks (D2NNs), which use a cascade of passive diffractive media to implement a synaptic strength matrix between layers in the network.68 A thin optical element is designed with variable thicknesses at the “resolution” of the network (number of neurons in the layer) and controls the complex-valued transmission and reflection coefficients at each point. Mathematically, each point is considered a secondary source for the incoming coherent light signal that acts as a neuron in a fully connected neural network layer. Weight matrices implemented by the network are fixed, with the diffractive medium being fabricated as a passive optical element by 3D printing or photolithography. Inputs to the network can be encoded on the amplitude or phase of incoming coherent light before the network output is measured by an array of photodetectors at the output plane [as in Fig. 3(a)]. The demonstration of the above D2NN did not experimentally incorporate nonlinear optical neuronal transfer functions or synaptic reconfiguration. The optical losses per layer (51% average power attenuation per layer reported in Ref. 68) and relatively high optical intensity levels required to drive optical nonlinear transfer functions may limit the scalability and practicality of this method. However, this does not represent a fundamental limit of the technique and may be reduced in the future by improved diffractive surface design.
In-Plane Photonic Neural Networks. As opposed to the out-of-plane PNNs, in-plane photonic neural networks implement all interconnected photonic synapses and photonic (or optoelectronic) neurons on planar photonic integrated circuits and offer more solid realization especially when utilizing silicon photonic technologies. Despite the lack of lenses, as in the out-of-plane approach, unitary photonic mesh networks consisting of many (unitary) 2 × 2 optical couplers can perform arbitrary matrix operations, including convolution and Fourier transforms. Furthermore, reconfiguring the photonic mesh can be achieved through the individual 2 × 2 optical couplers, which can be considered a photonic synapse in the synaptic network.
Miller proposed a method to implement arbitrary weighted connections from optical inputs to a set of optical outputs. This method relies on a “universal linear optical component”69 comprising a network of 2 × 2 Mach–Zehnder interferometer blocks—connected in a mesh as illustrated in Fig. 3(b)—which was proven capable of implementing any linear transformation from its inputs to its outputs (i.e., from preceding neurons to subsequent layers).69,70 In addition, other reconfigurable photonic structures can also implement matrix transformations, such as crossbar networks and micro-ring resonator banks, discussed in Sec. II C. By incorporating nonlinearity in the form of photonic or optoelectronic neurons between each layer (as shown in Fig. 4), a multi-layer neural network can be constructed.
Aside from in-plane and out-of-plane networks, it is worth mentioning that there are examples of optical neural networks designed using a combination of optical fibers, large-scale laser sources, and various other off-the-shelf modulators and components that can be purchased from providers of telecommunication companies.71–76 Given that such fiber-based or free-space components can be easily moved, swapped, and otherwise manipulated, it is far easier to develop and prototype neural networks using these technologies and architectures. In fairness—and not to diminish work in this space as merely prototype networks—it is speculatively conceivable that such neural networks can be used to simultaneously communicate and process information collected over broad distances much faster than would be possible for any exclusively localized system, photonic or electric. Examples of such fiber-based optical networks include optical reservoir computers71–75 (discussed more in Subsection II B 3), which have long exceeded a data processing speed of >1 GB/s75 for such tasks as chaotic time series prediction (over 107 points per second) with an error rate of about 10% (compared to contemporary electronic approaches at 1%). Rafayelyan et al.,73 in more recent work, reported processing speeds on the order of 1014 operations per second for multi-dimensional chaotic system prediction—compared to the 1015–1017 operations per second possible on supercomputers for similar tasks.73 In these examples, laser and amplifier feedback is manipulated to substantially increase the parallelization, with neurons communicating on various parallel wavelengths or modes in the system.
Despite the computational efficiency and promise of out-of-plane and fiber-based approaches, the remainder of this tutorial focuses on the construction of in-plane, integrated photonic neural networks that more closely approximate the physical scales of biological systems.
3. Network topology
The physical structure or topology of the neural network can determine the possible transformations of data between successive layers or ensembles of neurons. Various neural network structures exist, with some derived from biological connectivity patterns and others deduced from the functions intended to be computed. In the design of a neuromorphic computer, the topology of networks supported depends on the capabilities of the hardware structures employed (discussed more in Sec. II C) and the learning rules most suitable for a given application. Neural network topologies are often divided between feedforward and recurrent approaches though brain-inspired structures typically fall into the latter category. Figure 5 summarizes the common topological structures found in neuromorphic computing.
Feedforward neural networks are the simplest network topology to implement and are useful in situations with clear mappings between input and output data as in the prevalent MNIST handwriting classification task. As shown in Fig. 5(a), feedforward refers to the flow of information exclusively from input to output. In general, restricting the flow of information allows the network to be trained more easily by backpropagation and equivalent supervised learning algorithms (discussed more in Sec. II D). Often, synaptic interconnections are fully or densely connected, meaning that each sending neuron is connected to each receiving neuron. This all-to-all pattern of connectivity provides the most flexibility for learnable patterns though at the cost of increased parameters (synaptic weight strengths) and computation for training those parameters. Various photonic structures can be used to implement these weighted connections based on the passive propagation of light—for example, phase-change materials (PCMs), Mach-Zehnder Interferometers (MZIs), or micro-ring resonators (MRRs) (discussed more in Sec. II C 1).
Fully connected feedforward networks require many parameters, so multiple strategies have been developed in traditional ANNs to reduce the number of parameters. Convolutional neural networks (CNNs) are the most popular remedy, which combines weight sharing and sparse connectivity to perform pattern recognition with far fewer weights than fully connected networks. As shown in Fig. 5(b), a small weight pattern (called a kernel) is swept across different positions of the input layer such that the subsequent layer reflects the strength of that pattern at each position. Photonic neural networks can take advantage of wavelength-division multiplexing (WDM)60,65,77 and time-division multiplexing (TDM)78 in addition to space-division-multiplexing (SDM) and other forms of parallelism58 to repeatedly apply the same kernel over multiple positions of an input vector.
Recurrent neural networks deviate from this exclusively “forward” propagation of information and incorporate cyclical and lateral pathways—see Fig. 5(c)—with varying degrees of connectivity. The broader range of topologies provides additional mechanisms for information processing in addition to the input–output mappings of feedforward networks. There are various forms of recurrent networks that appear in neuromorphic computing that use recurrent connections to add a persistent state (as in working memory) and dynamical properties to the behavior of the network.
In reservoir computing, a fixed (non-reconfigurable) recurrent neural network is sandwiched between two feedforward layers, as shown in Fig. 5(d), and only the output feedforward layers are reconfigured. The recurrent part of the network is called the “reservoir” and is defined with lateral connections (between neurons in the same layer or ensemble) of random strengths. The first feedforward layer also contains randomly selected weights and provides input to the reservoir, while the final feedforward layer is trained to “read out” the activity of the reservoir.79 The random flow of activity through the reservoir behaves like a dynamical system and enables the learning of temporal dynamics without the complicated training schemes required to train an entire network in other recurrent structures.80 To summarize, the reservoir is often said to transform the input from a temporal representation into a higher dimensional spatial representation that a linear classifier or predictor can more easily interpret; this behavior has been useful in applications such as digital signal processing, speech recognition, and general modeling of dynamical systems.81
Photonic and optical implementations of reservoir computing vary in architecture, modulation, and signal generation. For example, Duport et al.82 reported a fully analog variant of a popular optical reservoir computing system; it represents the neural activity of the reservoir as the modulated intensity of an external laser. Each neuron's activity is time-multiplexed on a long delay line (optical fiber spool) with a period roughly equal to the round-trip time of the line (∼8.4 µs). Other groups have demonstrated reservoir computing on CMOS-compatible integrated silicon photonics chips. Vandoorne et al.,83 for example, experimentally demonstrated a passive chip that uses only waveguides, splitters, and combiners, while the nonlinearity of the network is handled at signal detection and in the “read out” layer of the network. This approach was evaluated with multiple 2-bit binary logical tasks, with an error rate as low as 10−4 for the exclusive or (XOR) logical operation.
Winner-Take-All (WTA) networks are a biologically inspired topology in which recurrent or lateral inhibitory connections between excitatory neurons and inhibitory neurons—depicted in Fig. 5(e)—enforce a limit on the total activity of the layer. Strengths of incoming excitatory connections, lateral inhibitory connections, and bias currents are balanced to create the desired selectivity (softmax-like or hardmax-like transformation) of incoming information, much like that seen in the feature maps of the cerebral cortex.84
Zhang et al.85 recently demonstrated a simulated WTA mechanism using the inhibitory dynamics of vertical-cavity surface-emitting lasers with saturable absorption (VCSEL-SA). Each VCSEL-SA in the circuit acts as a spiking neuron whose output is interpreted from a spike of intensity in the X-polarization (XP) mode; the Y-polarization (YP) mode is sent to the competing neuron (VSCEL-SA) and induces an YP spike that pushes the neuron into a refractory period that temporarily prevents XP spikes from being generated. Zhang et al. also used simulations to show that this bio-inspired mechanism can be used to implement the max-pooling operation required by many CNNs found in traditional machine learning. Max-pooling layers have been shown to increase the accuracy of deep CNNs by a factor of nearly three86 without the need for additional learnable parameters (synaptic weights).
C. Building blocks of photonic neuromorphic computing systems
As discussed in Sec. II B 2, PNNs consist of an interconnection network analogous to the axonal and dendritic projections of the neuron and a photonic or optoelectronic nonlinearity that corresponds to the excitability behavior of the neuron. Table I summarizes biological neural network components and examples of equivalent optical and electro-optic device components that emulate the biological component mechanism.
Biological component . | Biological constituent . | Biological function . | Equivalent photonic component . | Equivalent photonic function . |
---|---|---|---|---|
Synapse | Presynaptic terminal | Electrical signal (action potential) conversion to chemical signal (neurotransmitter release) | Photonic synapses or photonic mesh | Photonic couplers with variable coupling strengths to reflect synaptic weights |
Postsynaptic terminal | Receives neurotransmitters at the receptors | |||
Neuron | Dendrites | Spatiotemporal dendrite computing summation | All-optical or optoelectronic neurons | Photonic dendrites or input optical couplers (with fan-in) |
Soma | Integration and spike generation | Photonic somas: nonlinear transfer function achieved by all-optical or optoelectronic devices | ||
Axon and axon terminals | Signal transmission | Photonic axon and axon terminals: optical waveguides and output optical couplers (with fan-out) |
Biological component . | Biological constituent . | Biological function . | Equivalent photonic component . | Equivalent photonic function . |
---|---|---|---|---|
Synapse | Presynaptic terminal | Electrical signal (action potential) conversion to chemical signal (neurotransmitter release) | Photonic synapses or photonic mesh | Photonic couplers with variable coupling strengths to reflect synaptic weights |
Postsynaptic terminal | Receives neurotransmitters at the receptors | |||
Neuron | Dendrites | Spatiotemporal dendrite computing summation | All-optical or optoelectronic neurons | Photonic dendrites or input optical couplers (with fan-in) |
Soma | Integration and spike generation | Photonic somas: nonlinear transfer function achieved by all-optical or optoelectronic devices | ||
Axon and axon terminals | Signal transmission | Photonic axon and axon terminals: optical waveguides and output optical couplers (with fan-out) |
Because biological neurons connect to thousands of other neurons via thousands of synapses, dendrites, and axons, it is difficult for bio-derived or bio-inspired artificial neural networks to achieve equivalent levels of connectivity. Biological systems can more easily achieve such high degrees of interconnection because of synapses' extremely small (∼10 nm) size and even smaller neurotransmitters and receptors. In addition, the unique electro-chemical dynamics allow tree-structure branching of dendrites and axon terminals to achieve the broadcast (at the axon terminals) and sum (at the dendrites) of the neurotransmitter signals. While, in principle, possible to construct such a synaptic network for each neuron, it is simpler to construct a mathematically equivalent mesh of synaptic interconnects, such as a crossbar, a mesh of 2 × 2 couplers, or other parallelized structures.
Electronic neuromorphic computers typically consist of electronic crossbars with memristive synapses at each crosspoint (N2 synapses for an N × N crossbar) and electronic neurons at each end. Likewise, photonic neural networks comprise photonic synaptic meshes consisting of photonic couplers and photonic or optoelectronic neurons that provide nonlinearity. In both cases, the number of photonic synaptic couplers scales as O(N2) for an N × N neural network. Unlike biological synaptic interconnections, the synaptic couplers in these meshes do not directly and individually alter the corresponding weight values. Instead, the collections of these photonic synaptic coupling coefficients collectively apply the weight matrix values wij between the ith presynaptic neuron and the jth post-synaptic neuron.
1. Forming reconfigurable optical synapses
Photonic matrix multipliers passively couple light from a set of N input ports to a set of M output ports according to a unitary weight matrix U. It is possible to remove the unitary restriction by the inclusion of active gain media. However, due to the preference for energy efficiency in neuromorphic computing, the remainder of this section assumes that passive weighting is sufficient for computation and all signal gains (if any) are handled by the neuron nonlinearity. The amount of light coupled from one input to one output is determined by one or more reconfigurable photonic elements that form the optical synapse. Manipulation of one or more material properties allows for the reconfiguration of these elements.
The electro-optical tuning of the refractive index requires materials with significant electrical-field induced optical index changes through the Pockels effect, the Kerr effect, field-induced carrier density changes, or other mechanisms. The magnitude of these index changes varies considerably for differing materials.95 The relationship between the electric field and changes in the index is more complex than for thermo-optic effects and ranges from simple linear relationships in the case of the Pockels effect to more complicated charge carrier distribution through the material for carrier effects. Photo-ionic effects are similar to electro-optical effects except that applied electrical fields can physically displace ions to drive them in or out of waveguide materials (e.g., polymers) to semi-permanently change the optical index.96,97
Optical Phase Change Materials (PCMs) achieve phase changes from one material structure to another (e.g., crystalline to amorphous and vice versa) often by external heating—rapid heating to high temperature and cooling vs slow heating and cooling—to induce changes in the refractive index and loss. Heating is often achieved electrically by incorporating pulsed Ohmic heaters as described previously though it is also possible to achieve such heating by optical pulses themselves, thus realizing the “all-optical” reconfiguration. However, due to the typical optical power levels required for such an optical reconfiguration of PCMs, optically tuning the PCM can be restrictive compared to other PCM reconfiguration approaches, such as electronic or thermal reconfiguration.98 A non-volatile synaptic reconfiguration achievable by PCMs and photonic–ionic materials89 is attractive because no static power consumption is required to maintain the induced changes in material properties.
Tuning the resonant frequency and coupling coefficients allows the modulation of the transmission to each port.102 It should be noted that, in contrast to the MZI unit, there will be a loss in transmission to the unused output port of the MRR (except for the case of total transmission at the desired wavelength channel). As such, the multiplication is not unitary as in the case of the MZI. However, as discussed in Sec. II C 2, a matrix multiplier can still be constructed from this unit by assembly into banks.
2. Assembling photonic synaptic meshes
Using the above reconfigurable photonic elements provides many ways to construct a photonic matrix multiplier of a given number of input and output dimensions. Each method has a differing number of tunable elements required and different design considerations. These methods can be categorized into three prevalent technologies: cross-bar networks, MZI meshes, and MRR banks.
Cross-bar networks, as shown in Fig. 7(a), are the simplest approach, typically aligning incoming signals along one direction (i.e., east-west) and outgoing signals along the other (north-south). Reconfigurable materials, such as PCMs or optical memristive materials, allow incoming light to be coupled into the output waveguides according to synaptic strength. MRRs can also be used to couple light from input to output ports; however, this has not been demonstrated for matrix multiplication to our knowledge. Feldmann et al.60 demonstrated a PCM crossbar for parallel matrix multiplication in a convolutional network and reported 1012 MAC operations per second for a CNN accuracy of 95.3% compared to 96.1% for the equivalent CNN on a traditional computer. The crossbar networks consist of N input and M outputs with N × M connections, allowing for all-to-all connectivity at the cost of N × M reconfigurable coupling elements. Crossbar networks can implement rectangular or square matrices but require careful design to ensure that crossings further from the input receive sufficient optical power.
MZI meshes are another example of a photonic matrix multiplier and use collections of MZIs as an optical linear unit (OLU) that can perform calculations as a facet of their respective transfer matrix, as shown in Fig. 7(b). Utilizing this structure for the OLU, one can build an N × N arbitrary unitary matrix that consists of MZIs arranged in various mesh topologies, of which the most common are triangular,69 rectangular,103 and diamond104 as depicted in Figs. 8(a)–8(c), respectively. Gu et al. also presented a more complex butterfly topology that utilizes waveguide crossings and reduces the total number of MZI units in comparison to the former three topologies; see Ref. 105 for more details. For the former three topologies, the total number of MZI units for an NxN matrix is exactly N(N − 1)/2 though each MZI is composed of two reconfigurable elements for a total of N(N − 1) controllable parameters.
The rectangular mesh, simply put, connects MZI units side-by-side, as shown in Fig. 8(b). The upper arm of the previous MZI unit connects to the lower arm of the next MZI unit. The advantage of the rectangular mesh is that its compact arrangement has the minimum optical depth among the configurations mentioned.103 A non-square matrix must be formed by constructing a square matrix with the largest dimension and leaving the additional input or output ports unused. The butterfly mesh is based on the structure of the rectangular mesh with some MZI units pruned to reduce the number of MZI units needed at the cost of some reconfigurability (i.e., not all unitary matrices can be represented).105 Triangular meshes follow a similar connection rule to the rectangular mesh but start with only the two bottom ports and increase the coupling to additional ports in a diagonal line as depicted in Fig. 8(a). Triangular meshes require a higher optical depth and more chip space but support self-configuration mechanisms as demonstrated in Ref. 69 with the same number of parameters as the rectangular mesh. Diamond meshes are a modified version of the triangular mesh. It adds (N − 1)(N − 2)/2 additional MZI units that vertically mirror the shape of a triangular mesh as seen in Fig. 8(c). Shokraneh et al.104 showed that this symmetric topology can provide additional degrees of freedom for weight matrix optimization in the backpropagation training process while also improving the fault tolerance due to errors in fabrication.
MRR banks, as previously mentioned, take advantage of WDM to broadcast spiking signals widely and weight them with selective filters at the receiving neuron.106 Banks of MRRs are formed as shown in Fig. 7(c) by aligning them along a shared pair of waveguides while varying the radius of each ring sufficiently to avoid wavelength collision. Each receiving neuron has its own dedicated MRR bank to implement the incoming synaptic strengths of each wavelength before a balanced detector can measure the overall incoming signal intensity. The number of MRRs in each bank matches the number of sending neurons in the network, while the number of banks matches the number of receiving neurons. This means that a total of N × M MRRs will be needed to implement a fully connected network between a layer of N sending neurons and M receiving neurons.
3. Photonic and optoelectronic nonlinear neurons
After receiving sufficiently strong stimuli, biological neurons emit electrical pulses known as action potentials or spikes. Encoding of information in the form of spike timing (temporal coding) or the spike rate (rate coding) has been a subject of active research. In designing nanophotonic spiking neural networks, the three essential elements—the neuron, the synapses, and the coding scheme—should be designed together to have the following attributes:51
weighted addition: the ability to sum weighted inputs,
integration: the ability to integrate the weighted sum over time,
thresholding: the ability to decide whether or not to send a spike (all-or-none),
reset: the ability to have a refractory period during which no firing can occur immediately after a spike is released, and
pulse generation: the ability to generate new pulses.
Biological neurons consist of three primary structures: dendrites, soma, and axon.80,81 The neuron body, or soma, forms the thresholding function, accumulating input currents from dendritic trees until the internal voltage meets the condition for spike generation. Exact mechanisms for this spike generation vary in biological realism and complexity and can be reviewed in Refs. 107 and 108. Photonic implementation of this function can be generally classified between all-optical (or photonic) and electro-optic approaches.
All-optical neurons tend to use more simplified neuron models due to the difficulties in implementing optical nonlinearity. One of the simplest approaches uses traditional ANN activation functions, such as the sigmoid function, and maps this onto spiking hardware as in a rate-encoded ANN translation.109 Another common choice is the leaky-integrate-and-fire (LIF) model,110 in which an internal state variable—representing the membrane potential in biology—constantly decays exponentially toward some equilibrium value. Incoming spikes increment the membrane potential according to synaptic strength, and the neuron fires if the potential surpasses some threshold before decaying back to its resting potential. Choices of nonlinearity to implement these models include the use of semiconductor optical amplifiers (SOAs),109,110 vertical-cavity surface-emitting lasers (VCSELs),111–113 saturable absorption,43,114–117 and more recently passive micro-resonators.118 These optical neurons have not yet been demonstrated in large-scale networks, and more work is needed to establish their computational abilities.
Optoelectronic neurons, the alternative, combine the advantage of well-studied electronic nonlinearities alongside the fast, nearly lossless transmission119 and zero-energy weighting provided by the photonic devices discussed in Sec. II C 2. O/E/O conversion uses photodetectors to generate an electrical current in proportion to the power of light received, thus converting the aggregated optical inputs from the synaptic mesh into electrical currents. The electrical circuit, in turn, can use any nonlinear circuit element to implement the spiking function and generate an optical output using semiconductor lasers. Nozaki et al.120 demonstrated that close integration between the photodetector and the modulator reduced the integrated capacitance to 2 fF and that non-spiking neural nonlinearity can be achieved at an extremely low energy consumption of 4.8 fJ/bit at a speed of 10 Gbit/s. However, such a neuron architecture requires a constantly powered laser source. For instance, continuous currents supplying the lasers described in Ref. 120 consume a significant amount of energy even when the neurons are idle. These dynamics can be implemented by closely integrating CMOS transistors with photodiodes and electro-optic modulators. As discussed in Sec. II B 1, spiking neural networks offer far better overall energy efficiency due to the sparse nature of communication in event-driven neuromorphic computing. Recently, a low-power, 6-transistor soma design has been demonstrated for spiking optical neural networks.121 This soma design consumes 21.09 fJ/spike with a maximum spiking rate of 10 GHz on 90 nm CMOS.
Similar O/E/O circuit designs can potentially allow the use of more biologically accurate neuron models by replacing the transistor circuit with any desirable analog electrical neuron nonlinearity, such as that presented in the study by Farquhar and Hasler.122 Unlike the all-optical neurons, O/E/O conversion limits the response speed of a single optoelectronic neuron because of the analog bandwidth limitation of the electronics. However, throughput increases in neural networks benefit from optical parallelism in wavelength and spatial domains. Miscuglio et al.123 discussed that for CMOS-compatible integrated neuromorphic devices, >25 GHz is possible with a relatively low energy consumption of <10 fJ/b. Careful studies of benchmarking the system-wide throughput, energy consumption, and latency for given workloads are necessary to correctly compare neuromorphic computing systems among the various optical, electrical, or optoelectronic technologies.
D. Learning
Gradient descent algorithms, such as backpropagation and its many variants, are often used to train ANNs and have recently been applied to PNNs. A cost or loss function is associated with the neural network that numerically penalizes differences between outputs of the ANN and their respective target values. The matrix elements that refer to the weights of the synaptic interconnections within the network are tuned to minimize the cost function over the input space based on computed gradients with respect to each weight. Backpropagation is a class of gradient descent algorithms that extends this procedure by sequentially applying the chain rule to calculate gradients from the output to the input layer. Recurrent layers within networks can apply backpropagation through time by unrolling the network activity into time steps. Various other forms of backpropagation can be found in the literature; see Ref. 124 for a recent review of these techniques in the context of deep neural networks.
Recent work125 used photonics and adjoint variable methods (AVMs) to derive a photonic implementation of backpropagation for training PNNs. AVMs utilize the field solutions of two electromagnetic simulations, referred to as the adjoint field and the original field, to derive the cost function of the photonic neural network. This cost function is more compact in terms of measurable quantities within the network and allows for the parallel gradient computation for all phase shifters within an MZI network. Gradient computation can then be completed using intensity measurements of the adjoint field, which physically corresponds to a backward propagating waveguide mode sent into the system through the output ports. Such a backpropagation implementation is robust under noise and scaling while allowing in situ computation for photonic neural networks.
Various photonic synapses have demonstrated STDP behavior by modulating the gain of SOAs,129,130 changing the transmission levels of phase-change memory materials,131,132 and by other photoelectric device mechanisms.131 STDP supports both supervised133 and unsupervised learning134 rules that have been demonstrated by simulation by Xiang and colleagues. In Ref. 133, a VCSEL-SA structure is simulated to implement a supervised XOR learning task with an STDP learning rule. VCSEL-SA is chosen because its inhibitory dynamics can be used as inhibitory weights as seen in Ref. 85. The XOR task was solved with supervised training, where the weight update from a photonic STDP learning rule was derived in Ref. 135, and the sign is changed when the output does not match the target value. Training was shown to converge completely after 40 simulated epochs. Xiang and colleagues also demonstrated an unsupervised pattern recognition task with similar components. In both works, a controller circuit calculates STDP weight updates to program the synaptic weights.
Amato et al.136 made a comparison of multiple hybrid networks composed of Hebbian-trained and gradient-descent-trained layers to isolate the respective advantages of each technique in the context of a classification task using feedforward convolutional neural networks (on a traditional computing platform). Their analysis empirically showed that Hebbian learning falls behind gradient descent for deep networks, with evidence to show that intermediary layers are not trained as efficiently as layers near the beginning or end of the network. It was also found that Hebbian learning requires far fewer epochs (iterations over the training set) than gradient descent, with convergence after 2 epochs vs 20 for Hebbian and gradient descent, respectively.
Hebbian learning rules have the advantage of requiring only the information that is local to a given synapse. For example, two-factor learning rules compute weight updates from traces (filtered spike trains), representing average pre- and post-synaptic activities over a given time scale. As such, it is conceivable to implement an architecture that calculates all weight updates entirely in parallel, as opposed to gradient descent algorithms, which must sequentially propagate the credit of error backward through the network layer by layer and thus fundamentally require some amount of serial computation. This advantage, in turn, also allows Hebbian learning to be easily applied in online learning tasks wherein learning is performed whenever new information is made available. Gradient descent algorithms, in contrast, are more often used with batch training where the error is minimized across multiple samples simultaneously. As such, Hebbian learning is advantageous in settings with real-time learning requirements, while gradient descent is more amenable to applications in which many novel input samples are generated nearly simultaneously. See Ref. 137 for more on the advantages and formulations for online learning in Hebbian and spiking neural networks.
E. Review of existing approaches
Table II tabulates and compares the existing architectures for photonic neuromorphic processing designed by some well-known institutions and organizations in the field. This table lists the technologies used for synaptic meshes, neural nonlinearities, and any performance metrics reported if applicable. Synaptic meshes are discriminated based on the topology, learning mechanisms utilized for the network, network size demonstrated (experimental or simulated), and reconfigurable elements employed. Neuron models are distinguished by the functional form of nonlinearity employed and the use of optoelectronic or fully photonic devices. Finally, the performance of each architecture is reported as presented, given the lack of a single established performance metric agreed upon within the field (discussed more in Sec. IV). As such, this table aims to summarize and contextualize existing approaches to photonic neuromorphic computing.
. | Syn. . | Syn. . | Neural . | Demonstrated . | . | . | . | . | . |
---|---|---|---|---|---|---|---|---|---|
. | Research . | network . | reconfiguration . | network . | network . | . | Reported . | . | Neurons/ . |
group . | architecture . | mechanism . | topology . | size . | Learning . | performance . | Multiplexing . | nonlinearity . | References . |
UC Davis | Rectangular and triangular MZI meshes | Thermo-optic (SOI) | Feed-forward TNN | 1024 × 1024 (sim) | EC standard BP and TT decomposition | MAC/J | WDM and SDM | Optoelectronic Izhikevich spiking neuron | 121,138,139 |
Princeton | MRR banks | Thermo-optic (SOI with Ti/Q heaters) | Feed-forward | 2 × 3 w/0–250 broadcast nodes (sim/exp) | NR | Extinction ratio > 13 dB | WDM | Optoelectronic leaky integrate-and-fire spiking neuron | 100,106,140 |
University of Monarch | Rectangular MZI mesh | Electro-optical phase shifter | Convolutional (3 × 3) | 900 × 10 (sim/exp) | EC standard BP | 11.321 TOPS | WDM | EC sigmoid non-spiking neuron | 58,64 |
Pleros Aristotle University of Thessaloniki | Rectangular MZI mesh | Electro-optical (SOA) | Recurrent | 32 × 3 (sim), 4 × 4 (exp) | EC standard BP | SOA1 = 180 pJ/symbol and SOA2 = 300 pJ/symbol | WDM | SOA-based sigmoid non-spiking nonlinearity | 141–143 |
Stanford | Rectangular MZI mesh | Electro-optical phase shifter | Feedforward (FT pre-processed) | 16 × 10 (sim) | EC custom BP | 94% classification accuracy (MNIST) 7.7 × 1012 MAC/s | WDM | Custom optoelectronic non-spiking nonlinearity | 144–146 |
McGill University | Diamond MZI mesh | Electro-optical phase shifter | Feedforward | 4 × 4 mesh (sim/exp) | EC custom algorithm | 98.9% accuracy (0 dB MZI loss) and 75% accuracy (0.5 dB MZI loss) | WDM | NA | 147–149, 104 |
MIT | Diamond MZI mesh | Thermo-optic (SiPh + PCM) | Feedforward | 4 × 4 mesh (sim) | EC standard BP | ∼100 pJ/FLOP | TDM | NA (sug. bistable nonlinear photonic crystals) | 150 |
Ghent University | Crossbar network | Electro-optical (PCM) | Passive reservoir computing | 4 × 4 (sim) | EC complex-valued ridge regression | Minimum error rate = 10−3 | TDM | Photodetector non-spiking nonlinearity | 151–156 |
NTT | Crossbar network | Electro-optical (cross-gain modulated SOA) | Reservoir computing (RNN) | Single device | NR | 43 mW consumed, normalized mean square error (NMSE) ∼ 0.112 | TDM | Custom SOA non-spiking nonlinearity | 74 |
NIST | Rectangular MZI mesh | Electro-optical (superconducting-nanowire single photon detectors) | Spiking feedforward | 49 SNSPDs (exp) | NR | NR | TDM | Optoelectronic integrate-and-fire spiking neuron | 157–160 |
University of Washington | Crossbar network | All-optical (PCM [GST]) | Convolutional (2 × 2) | 256 × 256 input (sim/exp) | EC standard BP | 25 TOPS/mm2 | WDM | NA | 78 and 161–163 |
George Washington University | Rectangular MZI mesh | Thermo-optic (PCM [GSST-Si]) | Feedforward (FF); convolutional | FF: 784 × 100 × 10 (simulation); CNN: NR | EC standard BP | 93% inference accuracy (MNIST) | WDM | Custom sigmoidal non-spiking nonlinearity | 77,164 |
University of Paris-Saclay | MRR banks (arranged in crossbar-like topology) | All-optical (SOI) | Swirl reservoir computing | 4 × 4 (simulation) | EC ridge regression | XOR task 20 Gb/s with BER < 10−3 and injection power < 2.5 mW | WDM | Custom MRR non-spiking nonlinearity | 83,165,166 |
. | Syn. . | Syn. . | Neural . | Demonstrated . | . | . | . | . | . |
---|---|---|---|---|---|---|---|---|---|
. | Research . | network . | reconfiguration . | network . | network . | . | Reported . | . | Neurons/ . |
group . | architecture . | mechanism . | topology . | size . | Learning . | performance . | Multiplexing . | nonlinearity . | References . |
UC Davis | Rectangular and triangular MZI meshes | Thermo-optic (SOI) | Feed-forward TNN | 1024 × 1024 (sim) | EC standard BP and TT decomposition | MAC/J | WDM and SDM | Optoelectronic Izhikevich spiking neuron | 121,138,139 |
Princeton | MRR banks | Thermo-optic (SOI with Ti/Q heaters) | Feed-forward | 2 × 3 w/0–250 broadcast nodes (sim/exp) | NR | Extinction ratio > 13 dB | WDM | Optoelectronic leaky integrate-and-fire spiking neuron | 100,106,140 |
University of Monarch | Rectangular MZI mesh | Electro-optical phase shifter | Convolutional (3 × 3) | 900 × 10 (sim/exp) | EC standard BP | 11.321 TOPS | WDM | EC sigmoid non-spiking neuron | 58,64 |
Pleros Aristotle University of Thessaloniki | Rectangular MZI mesh | Electro-optical (SOA) | Recurrent | 32 × 3 (sim), 4 × 4 (exp) | EC standard BP | SOA1 = 180 pJ/symbol and SOA2 = 300 pJ/symbol | WDM | SOA-based sigmoid non-spiking nonlinearity | 141–143 |
Stanford | Rectangular MZI mesh | Electro-optical phase shifter | Feedforward (FT pre-processed) | 16 × 10 (sim) | EC custom BP | 94% classification accuracy (MNIST) 7.7 × 1012 MAC/s | WDM | Custom optoelectronic non-spiking nonlinearity | 144–146 |
McGill University | Diamond MZI mesh | Electro-optical phase shifter | Feedforward | 4 × 4 mesh (sim/exp) | EC custom algorithm | 98.9% accuracy (0 dB MZI loss) and 75% accuracy (0.5 dB MZI loss) | WDM | NA | 147–149, 104 |
MIT | Diamond MZI mesh | Thermo-optic (SiPh + PCM) | Feedforward | 4 × 4 mesh (sim) | EC standard BP | ∼100 pJ/FLOP | TDM | NA (sug. bistable nonlinear photonic crystals) | 150 |
Ghent University | Crossbar network | Electro-optical (PCM) | Passive reservoir computing | 4 × 4 (sim) | EC complex-valued ridge regression | Minimum error rate = 10−3 | TDM | Photodetector non-spiking nonlinearity | 151–156 |
NTT | Crossbar network | Electro-optical (cross-gain modulated SOA) | Reservoir computing (RNN) | Single device | NR | 43 mW consumed, normalized mean square error (NMSE) ∼ 0.112 | TDM | Custom SOA non-spiking nonlinearity | 74 |
NIST | Rectangular MZI mesh | Electro-optical (superconducting-nanowire single photon detectors) | Spiking feedforward | 49 SNSPDs (exp) | NR | NR | TDM | Optoelectronic integrate-and-fire spiking neuron | 157–160 |
University of Washington | Crossbar network | All-optical (PCM [GST]) | Convolutional (2 × 2) | 256 × 256 input (sim/exp) | EC standard BP | 25 TOPS/mm2 | WDM | NA | 78 and 161–163 |
George Washington University | Rectangular MZI mesh | Thermo-optic (PCM [GSST-Si]) | Feedforward (FF); convolutional | FF: 784 × 100 × 10 (simulation); CNN: NR | EC standard BP | 93% inference accuracy (MNIST) | WDM | Custom sigmoidal non-spiking nonlinearity | 77,164 |
University of Paris-Saclay | MRR banks (arranged in crossbar-like topology) | All-optical (SOI) | Swirl reservoir computing | 4 × 4 (simulation) | EC ridge regression | XOR task 20 Gb/s with BER < 10−3 and injection power < 2.5 mW | WDM | Custom MRR non-spiking nonlinearity | 83,165,166 |
NA: not applicable, NR: not reported, EC: externally computed, BP: backpropagation, FOM: figure of merit, Sim: simulated, exp.: experimentally demonstrated, SOI: silicon on insulator, SiPh: silicon photonics, SOA: semiconductor optical amplifier, PCM: phase change materials, GST: GeSbTe, and GSST: GeSeSbTe.
III. TOWARD SCALABLE PHOTONIC AND OPTOELECTRONIC NEUROMORPHIC COMPUTING
One of the major remaining challenges of both electronic and photonic neuromorphic computing is the physical composition of large-scale neural networks. The photonic (and general) neuromorphic technologies and methods discussed thus far have addressed scalability in other forms by reducing algorithmic complexity,52,53 by increasing parallelism through multiplexing (WDM,60,65,77 TDM,78 and SDM58), and in the general sense of decreased energy consumption of photonic components.47,58,121 Nonetheless, as the availability of data increases and the demand for computing resources rises to match, so will the demand for large-scale neural networks with high neural and synaptic densities. In such cases, the number of distinguishable modes or wavelength channels in photonic networks may become a barrier to further increases in parallelism. As such, other methods must be developed to increase neural and synaptic densities to match the needs for large-scale networks in reasonably sized form factors.
The remainder of this section introduces and discusses two promising photonic technologies under active research that may improve the physical scalability of future integrated photonic and optoelectronic neural network architectures. First, a recently developed technique, called tensor train decomposition,167 simplifies the structure of the neural network into only the fundamental elements (called “tensor cores”) required for fast, accurate training. The second utilizes new techniques in fabrication168,169 to reorganize the floor plan of fabricated circuits to utilize the third dimension more efficiently, thus realizing a 3D neuromorphic system much like the biological brain. Future work is needed to apply these algorithmic and manufacturing innovations to novel photonic neuromorphic computers.
A. Tensor-train decomposition
Shallow networks with large fully connected layers achieve almost the same accuracy as an ensemble of CNNs.170 Therefore, it is highly desirable to implement high-radix (e.g., 1024 × 1024) photonic synaptic interconnections. However, an N × N MZI mesh representing a unitary weight matrix requires a minimum of N(N − 1) reconfigurable elements and N cascaded stages,103 limiting mesh scalability for high-radix interconnections. Tensor-train decomposition offers increased scalability to PNNs by reducing the number of elements in the network (i.e., fewer MZI units). High-radix meshes are formed by cascading smaller-radix meshed called photonic tensor-train cores.138 Other benefits include the resulting reduction in optical insertion loss and decreased chip size for a given network.
Tensorized PNNs were proposed in Ref. 138 for deep feed-forward neural networks with a rectangular MZI mesh. In principle, however, the architecture can be applied to spiking neural networks and recurrent neural networks. The proposed network utilizes a discrete-time representation of input signals where each input sample is amplitude modulated on a continuous wave optical carrier at specific time intervals. A diagram of the architecture for a conventional PNN and a tensorized PNN is shown in Fig. 9, detailing the difference in the architecture that arises from utilizing tensor-train decomposition in the training process. By adding parallelism in both the wavelength domain using WDM technology and space domain using 3D photonics, the proposed tensorized PNNs maintain all the benefits of the conventional PNNs while reducing the insertion loss by 171.8 dB and the number of MZIs by a factor 582×.
A simulation demonstration has been completed by considering hardware implementation challenges, such as phase-shifter variations and beam splitter power imbalances.171 The simulation studies the response related to cross-entropy loss to benchmark the tensorized PNNs compared to conventional PNNs and Fourier-transform preprocessed PNNs. In particular, the accuracy of the different network models was studied for performing handwritten digit recognition with the MNIST dataset. The simulation results demonstrate that the TNNs are robust against phase-shifter variations and beam splitter power imbalances when studying the overall accuracy of the trained TNN model against the MNIST dataset. Furthermore, the implementation of the photonic TNN can achieve >90% classification accuracy while using 33.6× fewer MZIs than a conventional ANN, which can only achieve 71.6% accuracy under the practical hardware imprecisions studied. Further implications of the architecture for scalability, accuracy, learning, and hardware implementation are under active study.138,139,171
B. 3D electronic and photonic integration
3D integration is essential for practical photonic neuromorphic computing since typical photonic devices are of many wavelengths in size. Together with control electronics, 3D electronic photonic integrated circuits (3D EPICs) must be considered. At the heart of the 3D EPIC is a through-silicon-optical-via (TSOV) with silicon photonic vertical reflectors. Recently, a UC Davis team has experimentally demonstrated a 90° vertical coupler as illustrated in Fig. 10, which consists of a silicon photonic vertical via and a 45° reflector attached to a waveguide end.168,169 The interlayer connection loss is 1.3 dB (or 0.65 dB per coupling)169,172 and is limited by the mode matching of the lateral and vertical waveguides for the given 220 nm silicon thickness rib waveguides.169,172 For thicker silicon rib waveguides (e.g., 500 nm thick), the loss can be reduced to 0.5 dB per interlayer connection. This vertical coupler can also be used for interlayer coupling in a multi-layer silicon photonic 3D integrated circuit by placing a matching vertical coupler face-to-face. For coupling between the silicon photonic waveguide layer and a silicon nitride layer, inverse tapered couplers can be utilized where UC Davis and other groups have already demonstrated interlayer coupling loss at ∼0.01 dB.173,174
IV. BENCHMARKING METRICS
Successfully comparing any two things requires a system that can meaningfully establish their value. At the lowest level in the field of computing, the value is placed on computational ability as measured by such things as latency, throughput, accuracy, and energy efficiency. A complication arises when any of these metrics may change in the context of a specific application. This challenge is already present in the case of conventional general-purpose computing architectures, resulting in the availability of competing standards for benchmarking CPUs.175,176 This challenge is further exacerbated by the more open-ended and varied goals of neuromorphic computing architectures, the ambiguity of what a biological brain “does” and consequently what a brain-like or brain-inspired architecture should therefore also “do.” In traditional computing, different architectures may emphasize the optimization of the aforementioned values for different computational units based on the needs of the design—integer operations vs floating-point operations, for example. In the neuromorphic case, these operations may take the form of individual neural state updates, processing of spike traffic, or synaptic weight updates, each of which may be broken down into further suboperations. In summary, a good choice of benchmark addresses the following two questions: (1) How to establish the value? (2) How to establish fairness? The remainder of this section will describe commonly referenced metrics of comparison in photonic neuromorphic devices before describing a more general approach for benchmarking that can be applied to neuromorphic computers of various electronic and photonic architectures.
Multiple photonic devices of interest58,60,121,177 have reported their value in terms of the energy efficiency and throughput of multiply–accumulate (MAC) operations. Since MAC operations require many parallel memory accesses for large networks, they tend to contribute to the bottleneck for network-based computation on von Neumann architectures. While it is undoubtedly true that the MAC operation is a significant component of network computation, a singular focus on the performance of this operation would be far too myopic as the behavior of biological networks and real-time interactions may drastically affect performance. For example, the performance of the 11 tera-operations per second (TOPS)58 photonic convolutional accelerator is only available to workloads in which there is a need to perform so many operations in a given amount of time. In contrast, the real-time applications—that biological brains are most suited for—may not produce enough data for these processing speeds to be relevant.
Some have attempted to account for the device footprint in relation to the improvements in MAC throughput and energy.177 While this approach more fairly compares the performance of matrix multipliers across photonic and electronic platforms, it does not address the other factors involved in neural network processing, which many contemporary photonic devices offload to post-processing on traditional computers. Additionally, the SNN architecture—which is considered favorable to many in the context of neuromorphics—favors an accumulate and fire operation in which the membrane potential is a continuous state variable that undergoes continuous update in the ideal (analog) case, at which point the definition of a single MAC operation may not be clear.
Furthermore, Cole178 suggested that when programmability and data transfer are considered, the energy consumption of computational elements is negligible to both electronic and photonic approaches. Cole suggested that when considering the energy consumption of optical receivers, there is no advantage to photonic computation over a fully optimized electronic computer. Instead, Cole claimed that energy reduction efforts should be focused on the adoption of optical data transfer and not optical computation. It is important to note that the computation considered is binary and that representation in neuromorphic computation will not necessarily take this form. Nonetheless, this result demonstrates the importance of comparing neuromorphic processors in their entirety rather than considering the consumption of particular computational elements.
More mature neural network processors—for example, the electronic neuromorphic devices TrueNorth,13 Loihi,14 and Neurogrid11—have instead reported their achievements in terms of energy consumption per spike or energy per bit. Such metrics can neglect the question of fairness, as changes in workload or architecture can drastically change these metrics. When reporting the energy per spiking event, for example, it is unclear whether the operations contributing to the membrane potential updates should be considered. For a different workload, the number of events before the neuron reaches threshold and fires may vary and result in an inconsistent metric. If subthreshold operations are included, a digital architecture with discrete timesteps might require more energy per spike for workloads with longer gaps between spikes. If subthreshold operations are not included, then an architecture that chooses the smallest possible spiking energy may appear more efficient despite another architecture that might compensate with nearly passive energy costs for incoming spike accumulation. In such an architecture, spike energy may be less important; after all, sparsity in time is a major advantage of spiking networks. Furthermore, metrics involving units of bits are specific to a given architecture in that architectural choices determine what role these bits play and whether the bit width is flexible. thus making it more difficult to establish fair comparisons between widely different architectures. It has even been argued that bit precision is not significant in neuromorphic computing, given that one of the goals of the field is to perform computation with low-precision elements.11
Proper benchmarking of neuromorphic computers should take inspiration from solutions generated in traditional computing; various standards of benchmarking have been proposed, such as SPEC175 or MLPerf,176 which attempt to fairly discriminate the advantages and disadvantages of various architectures in different contexts. Mike Davies, Director of Intel’s Neuromorphic Computing Lab, has suggested that the field of neuromorphics has not yet matured enough to establish the specific operations that a fully qualified neuromorphic computer should support yet proposes a benchmarking suite known as SpikeMark.50 In this suite, various workloads, such as spoken keyword classification or hand gesture recognition, would be used to determine an architecture’s feature set and flexibility in various contexts while providing a standard for comparing energy efficiency and performance. As the name implies, SpikeMark focuses on spiking network workloads though it is important to note that spiking behavior may not be necessary for useful neuromorphic devices. In the book How to Build a Brain,179 Eliasmith describes a set of “Core Cognitive Criteria” that attempts to answer the question of what a brain “does” and can also act as a framework for designing neuromorphic benchmarks. The criteria are broken down into three categories—representational structure, performance concerns, and scientific merit—which are ambiguous to the choice of spiking or non-spiking neuron model and have been applied to the design of a large-scale network model known as SPAUN.180
Further work is still needed to resolve the ambiguity of what workloads should be considered fundamental to a neuromorphic processor and establish an official benchmarking standard. Regardless, the device footprint, energy efficiency, and relevant processing speeds should be considered jointly across multiple minimally overlapping tasks representing the desirable computational characteristics that researchers seek to borrow from biological brains.
V. SUMMARY
Recent advances in photonic circuits have led to theoretical studies and experimental demonstrations of synaptic interconnects with reconfigurable photonic elements capable of arbitrary linear matrix operations—including MAC operation and convolution—at extremely high speed and energy efficiency. Both all-optical and optoelectronic neurons of nonlinear transfer functions have also been investigated. A number of research efforts have reported orders of magnitude improvements estimated for computational throughput and energy efficiency. While the photonic technologies are relatively immature compared to their electronic counterparts, silicon photonics have emerged as a viable platform for integrated photonic neuromorphic circuits. However, substantial challenges remain in several areas: (a) cross-layer co-design of algorithms, architectures, circuits (photonic and electronic), training/inference, and benchmarking; (b) heterogeneous integration of dissimilar materials if non-volatile synaptic reconfigurabilities are to be incorporated; (c) realizing high-density 3D integration; and (d) implementing scalable learning and inference in the resulting system. With rapid and accelerating trends of active progress in this nascent area of research and development, we expect to see key advances in addressing each of the above challenges for viable photonic and optoelectronic neuromorphic computing in the future.
ACKNOWLEDGMENTS
The authors acknowledge enlightening discussions with Professor David A. B. Miller. This work was supported, in part, by AFOSR under Grant No. FA9550-18-1-0186.
AUTHOR DECLARATIONS
Conflict of Interest
The authors have no conflicts to disclose.
DATA AVAILABILITY
The data that support the findings of this study are available from the corresponding authors upon reasonable request.