The integration of artificial intelligence systems into daily applications like speech recognition and autonomous driving rapidly increases the amount of data generated and processed. However, satisfying the hardware requirements with the conventional von Neumann architecture remains challenging due to the von Neumann bottleneck. Therefore, new architectures inspired by the working principles of the human brain are developed, and they are called neuromorphic computing. The key principles of neuromorphic computing are in-memory computing to reduce data shuffling and parallelization to decrease computation time. One promising framework for neuromorphic computing is phase-change photonics. By switching to the optical domain, parallelization is inherently possible by wavelength division multiplexing, and high modulation speeds can be deployed. Non-volatile phase-change materials are used to perform multiplications and non-linear operations in an energetically efficient manner. Here, we present two prototypes of neuromorphic photonic computation units based on chalcogenide phase-change materials. First is a neuromorphic hardware accelerator designed to carry out matrix vector multiplication in convolutional neural networks. Due to the neuromorphic architecture, this prototype can already operate at tera-multiply-accumulate per second speeds. Second is an all-optical spiking neuron, which can serve as a building block for large-scale artificial neural networks. Here, the whole computation is carried out in the optical domain, and the device only needs an electrical interface for data input and readout.
INTRODUCTION
In 2016, the computer program “Alpha Go” developed by the British company Google DeepMind beat one of the world’s top players (Lee Sedol) 4:1 at the strategy game GO. Unlike chess, GO cannot be solved deterministically by today's computers due to the complexity of the game and the fact that a suitable heuristic method to evaluate specific situations does not exist. Therefore, Alpha Go relies on artificial neural networks (ANNs) to play the game GO.1 Its successor, AlphaZero, just needs the ruleset to learn the game without any additional human input,2 indicating that computer programs can find solutions and strategies for non-trivial problems on their own. Naturally, artificial neural networks are not only capable of solving well-defined problems in strategic board games but are also heavily deployed in daily life in a wide range of different applications such as image- and speech recognition, autonomous driving, or medical diagnostics, among others.3–5
Even though artificial neural networks (ANNs) lie at the heart of many problem-solving algorithms, providing sufficiently powerful hardware to run them remains challenging due to the large amount of data being processed.6 In the conventional von Neumann architecture that most processors are based on today, the processing unit is separated from the memory. Consequently, the data need to be shuffled back and forth between both, which leads to a speed barrier known as the von Neumann bottleneck. Moreover, this computing architecture is designed for serial computing, such that the commands are carried out consecutively.7 Thus, the von Neumann architecture is not optimal to solve data-heavy tasks.
Therefore, new hardware and architectures tailored to ANNs need to be developed. One option is to design application-specific integrated circuits (ASICs), as, for example, Googles tensor processing unit optimized for matrix vector multiplication (MVM). MVMs are the computational expensive tasks in many AI applications. Another approach is to build integrated circuits inspired by the working principle of the brain, called neuromorphic computing architectures,8–12 as biological brains outperform conventional processors in cognitive tasks as speech- and pattern recognition by many orders of magnitude. For example, the simulation of a mouse-scale cortex with 2.5 × 106 neurons on a personal computer is 9000 times slower and requires 40 000 times more power than its biological counterpart.13 Neuromorphic processors aim to work in a highly parallel way and process data directly in memory. Besides several implementations in CMOS electronics, another promising route to building neuromorphic computing systems is to switch to the optical domain. This does not only allow a high degree of parallelization by wavelength division multiplexing but also enables operation speeds up to 100 GHz.14 This article gives an introduction to photonic approaches to neuromorphic computing.
In the following, we will first provide an overview of artificial neural networks and explain the working principle of convolutional neural networks (CNNs), which are crucial for image classification. Then, we explain how to implement energy-efficient in-memory computing with phase-change devices in photonic integrated circuits. Based on the principles of phase-change photonics, we present a neuromorphic hardware accelerator that is designed to perform the time demanding task of matrix vector multiplication. Finally, we show an all-optical neuron, which can serve as a building block for large-scale neuromorphic artificial neural networks.
ARTIFICIAL NEURAL NETWORKS
From a mathematical point of view, an ANN is a function , which is defined via a set of free parameters . Depending on how is chosen, the neural network can solve a specific problem. In this context, solving means that it assigns the “correct” output activation to an input activation .
Figure 1(a) shows how a (fully connected) ANN is constructed. It consists of several layers, where each layer contains at least one neuron with an associated neuron activation. Each neuron of the nth layer is connected with all neurons of the (n + 1)th layer. The input layer is the interface to the real world, and the output layer presents the computational result of the neural network. The elementary building blocks of ANNs are the neurons [see Fig. 1(b)]. First, all the inputs (i.e., the output signals from the neurons in the previous layer) of the jth neuron are individually weighted by weights to and then added together to obtain the activation energy. Afterward, a non-linear function, for example, a rectified linear unit (ReLu) or Sigmoid, is applied to the weighted sum.15,16 It is important that the activation function of the neurons is non-linear; otherwise, all layers could be condensed to a single layer. The weights are the free parameters of the neural network and need to be chosen such that the neural network fulfills the intended function. The process of appropriately choosing the weights is called training. There are several types of trainings, which can be categorized on a basic level into supervised and unsupervised learning. In supervised learning, a training dataset with several pairs must exist and the ANN is fitted to the training dataset. This is typically a very time-demanding task, often implemented with a backpropagation algorithm.17 Unsupervised learning is applied, when no training set exists and a pattern from an unknown data stream needs to be extracted. This is achieved by implementing a learning rule: for example, inspired by the biological neurons, the Hebbian learning rule “What fires together wires together” can be used.18
A main issue of fully connected ANNs is that the number of free parameters tends to be huge. For example, an ANN designed for image classification that takes input images with 1 × 106 pixels and has 1000 neurons in the first hidden layer would already have 1 × 109 free parameters. Moreover, many hidden layers are deployed in deep neural networks to implement complex functionalities that further increase the number of free parameters.19 To overcome this challenge and reduce computational complexity, special classes of ANNs have been developed as the aforementioned convolutional neural networks (CNNs). CNNs reduce the number of parameters by introducing a preprocessing step to detect local features between neighboring pixels in the input images. In this step, the image is convolved with several filters, as shown in Fig. 1(c). Those filters are the free parameters of a convolutional layer and are determined during the training process. In the following, we will elucidate how these elementary concepts can be realized with integrated optical or nanophotonic devices in which non-linearity and the capability for learning are implemented with phase-change materials.
PHASE-CHANGE PHOTONICS
Phase-change photonics is the conjunction between phase-change materials and nanophotonics, which enables integrated photonic circuits (PICs) with novel functionalities. Phase-change materials (PCMs) are materials that can be rapidly switched between an (unordered) amorphous and (ordered) crystalline state and thereby exhibit stark contrast in the optical properties between both phases of matter. The transition between the states is reversible and can be induced via optical or electrical heating. Figure 2(a) schematically shows the switching dynamics of a PCM. If the material is heated (and kept) above the glass transition temperature but below the melting point, the atoms have enough energy to arrange themselves in the energetically preferred crystalline order. If the material is instead further heated above the melting temperature and subsequently rapidly cooled down below the glass transition temperature without giving it time to crystallize, the unordered amorphous state is obtained. Typically, the PCM needs to be cooled down with a rate of 1–100 K/ns to be switched to the amorphous state.20,21 In the amorphous state, no long-range order is present and covalent bonds between the atoms are dominant.22 Therefore, the electrons are strongly localized leading to low conductivity. In contrast, resonant bonds between several atoms are formed in the crystalline state leading to highly delocalized electrons and enabling high conductivity.23,24 Similarly, depending on the stoichiometry, the refractive index also varies greatly. Therefore, PCMs are already used in the field of rewritable optical data storage for decades.25
Integrated with photonic circuits, phase-change materials enable active control over phase and amplitude of light propagating through optical waveguides when evanescently coupled to the PCM. The following devices are based on the well-studied PCM Ge2Sb2Te5 (GST),26 which belongs to the popular group of chalcogenide solids based on germanium (Ge)–antimony (Sb)–tellurium (Te) alloys. Due to its non-volatility, no static energy supply is required to maintain the PCMs state. In combination with switching energies below 20 pJ, GST enables energy-efficient computation.27 However, it should be noted that a wide range of PCMs can be found especially in the ternary phase-diagram of Ge:Sb:Te with different properties in terms of switching energies and stability. Also monatomic phase-change materials are developed recently,28,29 potentially leading to very high switching speeds.
Using standard lithography processes, the PCM can be selectively deposited in specific areas of the waveguide, enabling two functionalities: first, the transmission of the waveguide can be locally varied by partially switching the GST patch between the amorphous and highly absorptive crystalline state. This can either be done electrically using external heater structures31 or optically since the light absorbed in the GST will heat it up.32,33 For optical switching, up to 64 intermediate transmission states in a GST patch have been demonstrated.34 Second, the GST can be used as a non-linear element inside a photonic circuit, due to the threshold behavior of the switching process.
Overall, phase-change photonics enable the realization of non-volatile memory cells and non-linear power-dependent elements inside an integrated photonic circuit. The low energy consumption and passive interaction behavior between a light pulse and the PCM make phase-change photonics an ideal building block for high-speed neuromorphic computing.
NEUROMORPHIC HARDWARE ACCELERATOR
A first step to true neuromorphic computing is to build neuromorphic photonic integrated circuits for mathematical operations that are time demanding in the conventional von Neumann architecture. In a fully connected layer of an ANN, the activations from the previous layer need to be weighted with various weights and accumulated. In a convolutional layer, the activation from the previous layer is convolved with several filters. Both operations can be written in the form of matrix vector multiplications, which are therefore often a bottleneck for computing ANNs.
In order to perform MVMs with a PIC, several multiply and accumulate (MAC) operations need to be carried out. Figure 3(a) shows how the multiplication of an (fast modulated) input pulse with power and a (tunable) matrix element is carried out with phase-change material cells. Depending on the phase state, transmission T through the PCM cell can be set; consequently, the power of the transmitted pulse is . Since the PCM is only evanescently coupled to the waveguide leading to absorption, the multiplication time is just the time the pulse needs to propagate through the PCM cell.
The beat term consists of two parts, the fast oscillating one with frequency and a slow oscillating one with frequency . The fast oscillating one will be averaged out by a photodetector. However, the slow oscillating one can be visible, depending on the detuning and detector bandwidth. Therefore, in order to avoid oscillations, the accumulated laser pulses need different wavelengths with sufficient detuning. In the following, this approach to add two coherent lasers is called “incoherent” accumulation. The opposing way would be to use two lasers at the exact same wavelength and add them together “coherently” by fixing the phase relation between both.
There are several approaches to add two signals incoherently. One way is using multiplexing techniques, such as wavelength-division multiplexing (WDM) or mode multiplexing. The advantage of this approach is that it is theoretically lossless. However, multiplexing requires very precise fabrication and potentially a way to actively tune the devices afterwards, due to the sensitivity of, e.g., ring resonators and Bragg filters used for WDM.35 Moreover, especially narrow band Bragg filters are large, increasing the footprint of the PIC. Another option is to combine two different signals with directional couplers. While this method relaxes the requirements for the fabrication, it unavoidably leads to optical losses.
Figure 3(c) shows a PIC that is designed to carry out MVMs as described in Fig. 3(b). The multiplication is carried out by choosing the pulse height and the PCM's transmission state, and the pulses are added together onto a common waveguide with directional couplers. The different inputs to have different wavelengths to leading to j outputs. A fixed fraction of the input light in the input rows is transferred to each coupling waveguide that connects the horizontal input waveguides to the vertical output waveguides. After a fraction of the incoming pulse is transferred to the coupling waveguide, it is partially absorbed by the PCM cell to carry out the multiplication. Finally, light is coupled from the coupling waveguide into the vertical output waveguide. In order to ensure that all matrix cells contribute equally to the final power in the output waveguide, the coupling fraction for the different cells has to be chosen properly. Therefore, only 1/ij of the input power of the first waveguide will reach the output waveguide. Here, 1/i is attributed to losses caused by the directional couplers and 1/j to the number of columns. We term this architecture photonic tensor core (PTC).
The advantage of a PTC is that it is a completely passive device (PCM is non-volatile) and therefore does not require any energy to preserve the matrix state. The calculation is carried out in a transmission measurement. Figure 4 shows the experimental result of four different convolution operations [see Fig. 1(c)] calculated with the neuromorphic hardware accelerator. As designed, it clearly detects the upper/lower and left/right edges of the input picture. This basic hardware accelerator was then used as a part of a convolutional network and tested with the MNIST database of handwritten digits. For the chosen CNN, the optimal prediction efficiency is 96.1%. When using the hardware accelerator instead of a conventional PC to carry out convolution, the efficiency only slightly drops to 95.3%.36
By employing a second tier of multiplexing, several matrix vector multiplications can be carried out in parallel without changing the PTC itself. In this case, the first vector uses the wavelengths to and the second to and so on. By demultiplexing the signal at the output of the PTC accordingly, the results of the individual MVMs can be obtained in parallel. The total bandwidth is only limited by the wavelength dependency of the directional couplers. In the experiments, the photonic neuromorphic hardware accelerator was able to operate at a rate up to 2 TMAC/s.
ALL OPTICAL SPIKING NEURONS
A further step toward all-optical neuromorphic computing is to perform the entire data processing in a photonic integrated circuit. In order to calculate an ANN, several MAC operations must be carried out first to weight and accumulate the input from the previous layer. Afterward, a non-linear activation function determines the neurons’ activation.
Figure 5(a) shows a photonic circuit designed to mimic a single neuron with four inputs. The same principles as in the photonic hardware accelerator are used for carrying out the MAC operations: multiplication is achieved with PCM cells and afterward incoherent addition of the weighted inputs is carried out. In this alternative WDM framework, here ring resonators are deployed instead of directional couplers to add the weighted inputs. The obtained weighted input power is sent to the activation unit that is shown in Fig. 5(b). The activation unit consists of a ring resonator with an integrated PCM cell and a probe waveguide (fixed wavelength). In the beginning, the PCM cell is crystalline and the probe pulse on resonance is mainly absorbed in the ring resonator. If the total weighted input power exceeds the threshold set by the melting temperature of the PCM, the PCM amorphizes. This reduces the losses inside the ring resonator and therefore the maximal extinction ratio and shifts the resonance frequency because of the change in both the real and imaginary parts of the refractive index. Now the probe pulse that previously was on resonance with the resonator is mainly transmitted. A switching contrast of up to 10 dB can be achieved in this way.37 Figures 5(c) and 5(d) show the experimental result of measurements performed with this type of artificial photonic neuron. In both cases, the neuron is trained to detect a specific pattern, and in both cases, it can clearly distinguish between the desired and various different patterns. The shown neuron design can serve as a building block for larger multilayer neural networks. In this case, the output pulse of the neuron in Fig. 5(a) can serve as an input to the neurons in the next layer. Moreover, unsupervised learning according to a Hebbian-like learning rule is possible by overlapping the output pulse with the input pulse in the PCM weights. By doing so, the weights will change depending on whether the neuron fires together with an input pulse or not.37
Since the PCM in the activation unit is switched continuously, cycle-to-cycle variations are present like in the electrical counterpart.38 However, for the operation of the optical neuron, a certain degree of noise can even be beneficial to avoid local minima in the training process and are also present in biochemical neural networks.39 Additionally, neuromorphic circuits are comparably tolerant to small variations of the weights and inputs, allowing for the precision of the calculations to be reduced significantly.40 After training the neuron, the PCM weights are stable over months27 since the transmission of the PCM cell does not depend on conductive filaments in the PCM but results from the evanescent coupling between the waveguide and the PCM. Therefore, spatial variations in the PCM state, which are small in comparison to the optical wavelength, are averaged and therefore have a small impact on the overall transmission.
CHALLENGES AND OUTLOOK
Recent work on neuromorphic computing demonstrates prospects of building brain inspired photonic integrated circuits. Nevertheless, there are several challenges to overcome before they can commercially challenge conventional architectures in the field of artificial intelligence.
Even though a photonic hardware accelerator can theoretically reach unprecedented performance in the PMAC/s range for a single matrix,36 the device footprint is substantially larger than electronic hardware. The silicon nitride waveguides deployed in both presented PICs have a width of 1.2 μm, and thus, the total device size is on the order of square millimeters to square centimeters. In comparison, nowadays electric circuits can be fabricated in a 5 nm process. However, this is the result of decades of commercial optimization, starting from 10 μm MOSFET processes in the 1970s. There are several approaches of how to reduce the footprint of PICs. First, one can use another platform with a higher refractive index contrast than SiN on SiO2 leading to smaller waveguides and smaller possible bend radii as, for example, silicon on insulator (SOI).41 Moreover, one could build the PIC not only in a plane but use multilayer processes to move toward 3D architectures.42 The larger footprint of photonic circuits compared to electronics can also be compensated by the ability to multiplex signals on different wavelengths (a feature that is not available in electronics). This way the same circuit can be used for different computations at the same time increasing the computational density and parallelism.
Furthermore, for a fully functional system, various optical components need to be integrated on chip and a sufficient interface needs to be provided before it can be used commercially. In the examples outlined above, only the computational unit itself is integrated on the chip, whereas the required laser sources, multiplexer, modulators, and detectors remain off chip. This makes it challenging to scale and is unfeasible to use outside laboratories. Switching to a different platform, for example, InP or SOI, is a promising route to integrate all components on the chip. Finally, both designs need an electrical interface, in order to make it compatible with existing technology. Since the PIC works in the analog domain and conventional electronics is digital, digital to analog converters play a crucial role.
Overall, neuromorphic computing implemented in photonic integrated circuits using phase-change devices is a promising way to satisfy the rapidly growing computational demands of artificial intelligence. Due to the high modulation speeds achievable in the optical domain and the inherent capability for parallelization via multiplexing, it is well suited to process the large amount of data in artificial neural networks. In-memory computing with non-volatile phase-change materials also enables an overall energy-efficient process. The next step will be to move the experimental designs from the laboratories to commercial applications.
ACKNOWLEDGMENTS
This research was supported by EPSRC via Grant Nos. EP/J018694/1, EP/M015173/1, and EP/M015130/1 in the United Kingdom and the Deutsche Forschungsgemeinschaft (DFG) under Grant No. PE 1832/5-1 in Germany. W.P. gratefully acknowledges support by the European Research Council through Grant No. 724707. We further acknowledge funding for this work from the European Union's Horizon 2020 Research and Innovation Program (Fun-COMP project, No. 780848 and Phemtronics project, No. 899598).
DATA AVAILABILITY
The data that support the findings of this study are available from the corresponding author upon reasonable request.