The high demand for machine intelligence of doubling every three months is driving novel hardware solutions beyond charging of electrical wires, given a resurrection to application specific integrated circuit (ASIC)-based accelerators. These innovations include photonic-based ASICs (P-ASICs) due to prospects of performing optical linear (and also nonlinear) operations, such as multiply–accumulate for vector matrix multiplications or convolutions, without iterative architectures. Such photonic linear algebra enables picosecond delay when photonic integrated circuits are utilized via “on-the-fly” mathematics. However, the neuron’s full function includes providing a nonlinear activation function, known as thresholding, to enable decision making on inferred data. Many P-ASIC solutions perform this nonlinearity in the electronic domain, which brings challenges in terms of data throughput and delay, thus breaking the optical link and introducing increased system complexity via domain crossings. This work follows the notion of utilizing enhanced light–matter interactions to provide efficient, compact, and engineerable electro-optic neuron nonlinearity. Here, we introduce and demonstrate a novel electro-optic device to engineer the shape of this optical nonlinearity to resemble a leaky rectifying linear unit—the most commonly used nonlinear activation function in neural networks. We combine the counter-directional transfer functions from heterostructures made out of two electro-optic materials to design a diode-like nonlinear response of the device. Integrating this nonlinearity into a photonic neural network, we show how the electrostatics of this thresholder’s gating junction improves machine learning inference accuracy and the energy efficiency of the neural network.
The growing demands of neural network systems create an urgent need for the development of advanced devices to perform complex operations with fast throughput (operations/s), lower power dissipation (J/operations), and compact footprint leading to high operation density (operations s−1 mm−2).1,2 Photonic integrated circuit (PIC) based artificial neurons can pave the way for this specific challenge. One of the most significant benefits of photonics over electronics is that distinct signals can be straightforwardly and efficiently combined due to their wave-nature exploiting atto-Joule efficient electro-optic (EO) modulators, phase shifters, and combiners,3–5 simplifying essential operations such as weighted sum or addition and vector-matrix multiplications or convolutions.6 In a traditional electrical computing system, the transistors can no longer keep up with the demand for computational complexity since the lumped-element electronic circuitry limits the processor speed; alongside, the metallic wires further reduce the signaling delay (transmission throughput). Meanwhile, with device scaling, the static power begins to dominate the power consumption in microprocessors due to subthreshold leakage, a weak inversion current across the device, gate leakage, and a tunneling current through the gate oxide insulation leads to more power consumption and heating.7,8 The traditional electronic processor’s speed can hardly exceed 5 GHz because of thermal dissipation limits, whereas parallel computing architectures can increase the information processing speed. Still, the electrical channels (wires) are bound by the physical laws, which limit carrying bandwidth.
In contrast, a photonic system can potentially transmit a few orders of magnitude more information in every square unit owing to the sheer parallelism achievable in optics along with reduced crosstalk and interference when compared to electronics. The reason behind this is that photonics operates at a higher baud rate due to a lower capacitance (only photonic devices have capacitance and not of the circuit). Furthermore, the superposition property of light allows for parallelism strategies, where each physical channel employs multiple wavelengths by exploiting wavelength-division multiplexing (WDM) techniques,9,10 thus transmitting optical signals in different wavelengths at the same time without occupying extra physical space. A neural network usually contains three elements: a set of non-linear nodes (neurons), configurable interconnection (network), and information representation (coding scheme). A particular non-linear model of a neuron consists of a set of inputs that are the outputs from the other neurons connected to it, and the specific neuron integrates the combined signals and provides a non-linear response (also known as the activation function or “thresholding”). Different Non-Linear Activation Functions (NLAFs) can deliver advantages in various applications, which were extensively investigated recently.11 The implementations of those nonlinearities (neurons) experimentally demonstrated based on the physical representation of signals fall into two different implementations: optical–electrical–optical (O–E–O) or all-optical.12–15 All-optical neurons can represent the signals as semiconductor carriers or optical susceptibility but comes at a cost of higher energy. An interesting and energy-efficient solution, proposed and demonstrated in this work, is to combine non-linear optical devices with optical carrier regenerations through strong electro-optical nonlinearity. Recently, all-optical non-linear modules have been experimentally or numerically demonstrated with promising results in terms of efficiency and throughput achieved by several approaches, such as multi-sectional distributed-feedback lasers, induced transparency in quantum assembly, disk lasers, reverse absorption, saturable absorption, or graphene excitable lasers.12–17 Considering the complexity, a more compact and straightforward way to achieve those functionalities is to exploit electro-optically tuned materials by means of an electro-optic modulator (as NLAF) connected to a photodiode (neuron signal summation).18–20 Compared to the other modulator approaches that required interferometric schemes, an electro-absorption modulator (EAM) has the potential to exhibit much lower conversion costs from one processing stage to another and easily be fully integrated on silicon photonic platforms. The characteristics for optimizing the EAM are the modulation bandwidth and the strength of light–matter interaction (LMI) in order to achieve simultaneously high modulation speed and low energy-per-compute surpassing available electronic efficiency. While integrated modulators may enable high performance neural networks, the active material and device configuration need to be carefully engineered simultaneously.
The transparent conductive oxide (TCO) material class being potentially CMOS compatible can easily be integrated with silicon photonic platforms with on-chip electrical devices, such as digital-to-analog/analog-to-digital converters, active modulators, and metatronic solvers, providing the ability to make large-scale networks-on-chip. Indium Tin Oxide (ITO), a material belonging to the TCO family, has shown strong index modulation and tunable optical nonlinearity. Recently, research and applications based on ITO modulators focused on energy efficiency with compact size and design featuring dense integration with silicon photonics.21–24 Furthermore, a giga-hertz-fast ITO-based plasmonic Mach–Zehnder interferometric modulator was recently demonstrated by modulating the real part of the index while the LMI was enhanced by a plasmonic hybrid mode with small RC-delays.22,23 Electro-absorption schemes are generally simpler than EOMs as there is no requirement for reference phase comparison but come at a cost of not being able to completely extinguish the optical signal. As such, we make an intelligent choice of adhering to an electro-absorption scheme here as a sufficiently low level of modulation may suffice in neuromorphic applications where the shape of the nonlinear transfer function is as important as achieving a real “zero” state in digital applications.
In addition to TCOs for optical neurons are carbon-based materials such as graphene, a single layer hexagonal lattice of carbon atoms with the massless Dirac electronic structure, which exhibits exceptional electron mobility and a constant absorption coefficient of 2.3% over a wide spectral range from visible to infrared, which makes it a promising material for many applications.25,26 Graphene also has shown potential for high-speed optical devices such as modulators or detectors.27,28 Compared to traditional silicon-based modulators, graphene device footprint, operation voltage, and modulation speed are improved significantly, owing to its strong EO properties and intrinsic carrier mobilities. A waveguide-integrated graphene-based electro-absorption modulator was achieved by actively tuning the Fermi level of a monolayer graphene sheet showing a high modulation efficiency over a broad range of wavelength.29 A graphene-based Mach–Zehnder interferometer modulator has also been demonstrated with switching between electro-absorption and electro-refraction by applying different voltages, which can improve the signal-to-noise ratio of graphene-based electro-absorption modulators in long-haul communications.30 Furthermore, it has enabled a new field of hybrid devices, such as TCOs with graphene, to improve electrical performance while simultaneously maintaining the optical transmittance at a desired level. This combined film has been demonstrated as a transparent flexible hybrid electrode with lower sheet resistance since the carrier concentration of the surface is improved in such hybrid films allowing fast carrier injection and extraction from the TCO thin film to change the free carrier density, enabling rapid and robust optical modulation.31
Here, we present and demonstrate a modulator for photonic neurons with electro-optically induced nonlinear characteristics designed to resemble the popular leaky rectifying linear unit (ReLU) activation function. We show how this nonlinearity can be engineered via a compact free carrier-based absorption modulator based on ITO/graphene heterojunctions integrated in silicon photonic waveguides. This EO nonlinear thresholder is a tens of micrometers compact device with a short response delay and lower optical loss due to the optical transmittance of the ITO and graphene compared to the silicon-based modulator.32–34 Consequently, due to the transmittance achieved in this modulator, it can be implemented in a network without additional optical amplifiers, further reducing required energy consumption. Providing PIC-based nonlinearity in a synergistic MAC operation-threshold scheme is a critical path toward ensuring low inference energy when operating photonic-based application specific integrated circuits (P-ASICs).
II. DEVICE DESIGN AND FABRICATION
We fabricate the neuron-thresholding electro-absorption modulators on an integrated Si platform with passive waveguides on a silicon-on-insulator (SOI) substrate with 220 nm epi-Si height and 500 nm width of the waveguides to facilitate 1550 nm operation (Fig. 1). Grating couplers optimized for TM-like mode excitation in the waveguides are used for optical I/O to the PIC. Subsequent process steps to fabricate the active device begin with depositing a small capping (passivation) oxide layer of 5 nm to isolate any parasitic resistance loads from affecting the active device and to aid the grating coupler environmental coupling efficiency. An ITO thin film of 10 nm is deposited on top of a portion of the waveguide to act as the bottom electrode of our capacitor, followed by an oxide layer to facilitate gating (Fig. 1). We use a moderately high-k oxide, Al2O3, for both the capping oxide and gate oxide layer of ∼30 nm and, finally, place a single layer graphene sheet on top of the stack to act as the top electrode for the active capacitor and appoint relevant contact pads with Ti/Au to probe the device electrically. The active capacitor stack is fabricated using electron-beam lithography (EBL) for defining relevant patterns, ion beam deposition (IBD) for ITO deposition, electron-beam evaporation for the metals, and lift-off processing. A thin ∼3 nm adhesion layer of Ti is used in the Au deposition of 50 nm. IBD processing yields dense crystalline ITO films that are pinhole-free and highly uniform and allows for a room temperature process, which does not anneal ITO (i.e., no activation of Sn carriers as to facilitate electrostatic EO tuning).35 Incidentally, IBD technologies are advantageous for nanophotonic device fabrication due to their precisely controllable material properties, such as microstructure, non-stoichiometry, morphology, and crystallinity.36,37 We place the top graphene on the stack by wet transfer methods and subsequent EBL patterning and plasma etching. The ∼30 nm gate Al2O3 oxide layer is grown using atomic layer deposition (ALD) with a few hydrophobic surface layers to facilitate the graphene wet transfer process. In such a capacitor configuration, the carrier concentration of the ITO thin film can be altered by the potential applied, which changes the optical properties of the material leading to variations in the portion of the electric field absorbed by the thin layer. Increasing the carrier concentration level with active electrostatic gating enhances the modulation effect due to the increased free carrier absorption of the optical mode dynamics.38–40
A curtailing real part, n, and an increased imaginary part, κ, of the optical index near our operating wavelength λ = 1550 nm with respect to wavelength dispersion are observed in spectroscopic ellipsometry of deposited ITO thin films (see the supplementary material). This behavior is well known and expected from the Kramers–Kronig relations.38,39 Accumulation or depletion is obtainable through applied potential on a capacitor configuration whose one electrode is formed by ITO, changing the carrier concentration of the thin film. Note that inversion in ITO has not been reported in the literature. The optical property of ITO therefore changes dramatically depending on carrier concentration levels, resulting in strong optical modulation.21–23,35–41 In praxis, a 1/e decay length of about 5 nm has been measured,42 and modulation effects have been experimentally verified over 1/e2 (∼10 nm) thick films from the interface of the oxide and ITO.43 The contact and sheet resistance of the ITO film are ∼490 Ω and 95 Ω/□, respectively (see the supplementary material). The resistivity and mobility of the ITO film are measured to be 8.4 × 10−4 Ω cm and 23.7 cm2/V s, respectively, whereas the carrier concentration of the as-deposited ITO film, Nc = 3.1 × 1020 cm−3 (see the supplementary material), is purposefully away from the ENZ point (∼7 × 1020 cm−3) to keep the imaginary part low for the light-ON state of the thresholding modulator.
III. RESULTS AND DISCUSSION
The electric field intensity in the modal cross section through the central part of the waveguide exhibits the field profile across different layers of the active structure [Fig. 2(a)]. The presence of significant field strength in the ITO layer compared to the monolayer graphene across the gate oxide points to the capacity of graphene in this configuration as only electrical contact refraining from contributions to the modulation. This is also reflected by the in-plane electric-field in the mode, [Fig. 2(b)], which are the field components interacting with the monolayer graphene sheet.38,39,44 As the employed mode in our experiment facilitated by the grating couplers is TM-like, the in-plane electric-field is understandably inferior in strength to the overall field strength [Fig. 2(b)]. As the ITO material does not depend on the planar selectivity of the field and can alter its optical properties uniformly, the arising modulation is majorly from ITO carrier variation from accumulation/depletion based on the applied voltage because of the selected TM-like modal operation.
The device length was swept during fabrication to allow for length dependent studies of performance. We opted for a rather compact device ranging from sub-wavelength scales, 1.4 µm, to a few micrometer long devices, specifically 8, 12, and 15 µm. I–V measurements are performed on the active devices, and all the devices show working capacitor functions in the measured voltage range [Fig. 2(c)]. The capacitors do not exhibit hysteresis latching behavior with typical charge storage traits. In this capacitor configuration, we modulate the carrier density of the ITO thin film corresponding to the applied potential regulating the portion of the electric field absorbed by the thin layer. The mode profile showcases TM-like mode propagation in the active device with most of the light confined in the ITO [Fig. 2(d)]. The prominent contribution in the LMI, and hence, arising modulation thereof, originating from the ITO material is also apparent from the confinement factor, Γ, of the propagating mode inside the active capacitive stack. A closer investigation reveals a 2.70% Γ of the propagating mode where confinement in the ITO, ΓITO, is 2.66%. Only a miniscule amount of light can be felt by the graphene sheet owing to its thickness and corresponding low light confinement factor, Γgraphene, of only 0.04%. A schematic of the modal structure is provided as a guide to the relative layers in the structure to assimilate from the modal illumination profile [Fig. 2(e)]. A 1550 nm TM-like mode traveling in the waveguide is subjected to a shift in the mode profile due to the presence of the active capacitive stack and, accordingly, enhances the modal overlap with the active ITO layer. Increasing the carrier concentration level with active electrostatic gating enhances the modulation effect due to the increased free carrier absorption of the optical mode dynamics.38–40
Experimental results show a modulation depth (i.e., ER) of ∼2 dB for an absorption modulator length of only 15 µm [Fig. 2(f)], which is obviously higher per unit length compared to Si, while both (ITO and Si) operate with the free carrier modulation mechanism. This improvement of ITO can be attributed to (a) 2–3 orders higher carrier density and (b) the higher bandgap, which consequently leads to a lower refractive index.21–23,35,38–41 If the change in the carrier concentration δNc (e.g., due to an applied voltage bias) causes a change in the relative permittivity (dielectric constant) δε, the corresponding change in the refractive index can be written as δn = δε1/2 ∼ δε/2ε1/2; hence, the refractive index change is greatly enhanced when the permittivity, ε, is small.21–23,35,38–41 The Pauli blocking effect from graphene can be seen in the forward voltage range slightly rectifying the modulator behavior owing to the small light confinement in graphene. The extinction ratio and insertion loss both show linear trends for device length variations, as expected [Fig. 2(g)]. The extinction ratio increases monotonically with 0.132 dB/μm slope featuring the performance for this structure within the applied voltage range with a coefficient of determination for the linear fit, R2 = 0.99. Insertion loss, on the other hand, features a monotonic increase with some added length independent losses from other sources, such as the grating couplers, coupling to (and from) the Si waveguides, waveguide bending losses, and measurement factors. Experimental results showed an insertion loss of about 0.31 dB/μm for the fabricated devices with a coefficient of determination for the linear fit, R2 = 0.86. The length independent losses in our experiments amount to be almost 60 dB, which includes light coupling to (from) the grating coupler from a lensed fiber. This is quite high compared to our previous results22,23 and from experience working with similarly processed foundry tape outs. We have found the length independent losses (amounting mainly from coupling the light from a lensed fiber to the grating coupler through an air gap) to be around 20–30 dB in similar previous experiments. The additional almost 30–40 dB of loss can be accounted to the imperfections in our wet etch process during fabrication of this structure, leading to compromised performances. We needed to fashion openings on the gate oxide film on top of the ITO contact pads to facilitate electrostatic gating, and the wet etch process timings were misjudged and many of the opening patterns were undercut by the wet etchant, forcing the graphene adhesion to the top hydrophobic layer of the oxide to become loose and contaminate the entire chip. This unintended phenomenon coupled with wet transferred graphene causing a wrinkling effect in some places due to the uneven patterned surface and multilayer formation in patches that could not be plasma etched with recipes aimed at monolayers affected the loss drastically. However, the low propagation loss per length suggests feasibility of realizing longer devices to achieve higher modulation depths and can benefit in availing high speed alternatives in photonic devices refraining from plasmonic routes. To approximate for the length independent coupling losses, i.e., to and from the underlying Si waveguide, we performed finite-difference time domain (FDTD) simulations and found that the length dependent insertion loss matches nearly well with our experimental results at 0.27 dB/μm [Fig. 2(g)]. FDTD results point to a coupling loss of only 0.76 dB/coupling facet, which is indicative of the feasibility of such device configurations while staying in the photonic domain without having to opt for selective doping the Si waveguide but still availing pathways for high speed operations by utilizing high-mobility single layer graphene. This can certainly pose as a promising alternative to plasmonics keeping the insertion losses minimal.
The demonstrated EAMs exhibit a modest modulation range in rather compact (linear) footprints (∼0.13 dB/μm) and considerably low insertion losses (<2 dB) enabled by a dielectric (i.e., non-plasmonic) optical mode and compact design. Therefore, this ITO–graphene hybrid approach exemplifies a practical alternative to other EO modulators (e.g., Si and LiNbO3),21–23,40 without necessitating any interferometric or cavity schemes and, therefore, characterized by a broadband response, while offering orders of magnitude smaller chip-real estate. Next, we implement an optical module of a neuromorphic activation function based on our experimental EAM results for the weighting scheme relying on WDM as EAMs are spectrally broadband by definition (no resonance used).
IV. BROADCAST AND WEIGHT PHOTONIC NEURAL NETWORK
A broadcast and weight photonic neural network45 assigns each neuron a dedicated wavelength and multiplexes all of the outputs onto a single bus between each layer (Fig. 3). Individual nodes [Fig. 3(a)] connect to the input bus, each receiving the output of all the nodes of the previous layer. Each color is separated from the bus and weighted by detuning a ring modulator. The output of the weight is summed onto one or more photodiodes to generate an electrical output of the accumulated signal. The electrical output is then amplified and remodulates an optical signal using an absorption modulator. The combined electrical and photonic response of the photodiode, amplifier, and modulator creates a nonlinear activation function for the neuron MAC signal. The degree of nonlinearity in neural network nodes has been shown to be important in the higher layers of a deep neural network46 where the model must accentuate information from helpful dimensions while eliminating unhelpful dimensions to avoid the problem of dimensionality when making a decision.
We selected the 15 µm modulator for simulation as it demonstrated the largest modulation depth. The 15 µm modulator transmission transfer function was fit to a nonlinear ten-dimensional polynomial, a linear least square fit, a line passing through the minimum and maximum points, and a hypothetical line with twice the slope as the line through the minimum and maximum points [Fig. 3(c)]. These activation functions were, exemplarily, inserted into a Modified National Institute of Standards and Technology (MNIST) database47 neural network model consisting of three layers: two 100 node, fully connected optical layers using activation functions using the fit transfer functions, followed by a simulated electronic dense 10 node softmax activation output layer. The models were trained in Python with Keras using the Adagrad method with a 0.005 learning rate for 1000 epochs with a 1024 batch size.
The optical-link was modeled assuming an operating frequency of 1 GHz, 300 K temperature, 50 Ω photodiode impedance, 0.7 photodiode responsivity, 5 fF photodiode capacitance, and 50 pA photodiode dark current, and a 40 dB transimpedance amplifier (TIA) coupled to the output of the photodiode. The gain was required to drive the modulator across its full range. A sweep of the gain shows diminishing accuracy for values <40 dB. The input optical power was swept between 1 and 14 mW.
The simulation results show close to 10% increased accuracy for the nonlinear ten dimensional fit when compared to both the least squares linear fit and the linear fit through the minimum and maximum points. To determine if this increased performance was due to the nonlinear shape of the transfer function or simply due to the greater slope in some regions of the nonlinear transfer function, a hypothetical transfer function was modeled based on the line through the minimum and maximum points but with twice the slope. This hypothetical linear transfer function outperformed the other models at laser powers above 4 mW. While this test demonstrates that modulators with linear transfer functions may outperform nonlinear transfer functions in some cases, it is physically more challenging with a greater modulation depth and gain above ∼5 V.
The achieved modulation depth relative to the drive voltage limits the ability of the demonstrated modulator to create a nonlinearity without a gain provided by a transimpedance amplifier at the output of the photodiode. Notwithstanding, the gate oxide can be trimmed down considerably utilizing higher-k materials (e.g., 5 nm HfO2 as opposed to the current 30 nm Al2O3 design), which can facilitate high LMI leading to higher modulation while keeping the voltage bias at a tolerable range. Since the ITO film is closer to the Si waveguide and most absorption occurs in that layer [evident from the confinement factors in Fig. 2(d)], the absorption loss can be expected to remain monophonic. However, due to the shrinkage of the gap region (ITO/oxide/graphene), additionally tighter confinement of light inside this gap leading to coupling (to and from the Si waveguide) losses is expected to increase. Nevertheless, insertion losses can be minimized with modal impedance matching schemes employed at the coupling regions from (to) the underlying Si waveguides. These interplay and engineering aspects are an area of active research, and improvements to the current design composing a future option are included in the comparison table for different modulator-based neuron thresholders (Table I).
|Active material .||Modulation, ER/L (dB/μm) .||Switching energy, Usw (pJ) .||Drive voltage, Vd (V) .||3-dB bandwidth, f3dB (GHz) .||Insertion loss, IL (dB) .|
|ITO50||0.15||2||3||1.17 × 10−4||⋯|
|Active material .||Modulation, ER/L (dB/μm) .||Switching energy, Usw (pJ) .||Drive voltage, Vd (V) .||3-dB bandwidth, f3dB (GHz) .||Insertion loss, IL (dB) .|
|ITO50||0.15||2||3||1.17 × 10−4||⋯|
In this work, we demonstrated the first ITO–graphene heterojunction based electro-absorption modulator and used it to perform electro-optic nonlinearity (thresholding) as part of photonic neurons inside PIC-based neural networks. These electro-optic thresholders integrated into a silicon photonic platform enable monolithic nonlinearity without the need for dual chip approaches, which is costly from an energy-driver link budget aspect (50 pJ/bit for off-chip vs ∼1 pJ/bit for on-chip signal routing). These ITO–graphene photonic heterojunctions enable realizing leaky ReLU-like transfer functions for thresholding in photonic ASICs. Our results show a path for alternatives to plasmonic solutions keeping the insertion losses low, availing a photonic paradigm for high-speed operations utilizing ITO processing synergies. We further showed neuromorphic application feasibility of the demonstrated absorption modulators with the same enabling nonlinear activation functionalities in a feed forward broadcast and weight photonic neural network benchmarked by means of the MNIST classifier. We also compared relevant absorption based modulators in the recent literature for photonic neuromorphic applications as nonlinear activation functions for a synoptic scenario into different performance metrics. This work, where the presented devices are an initial proof-of-concept introducing a first realization of an ITO–graphene heterojunction for electro-optic modulation atop an SOI waveguide platform, paves the way for future electro-optic devices aimed at state-of-the-art photonic neuromorphic hardware.
See the supplementary material for additional information.
V.J.S. was supported by the Air Force Office of Scientific Research (Grant No. FA9550-20-1-0193) under the PECASE Award.
Conflict of Interest
The authors have no conflicts of interest to disclose.
R.A., Z.M., and V.J.S. initiated the project and conceived the experiments. R.A. designed and fabricated the devices. R.A. and R.M. conducted experimental measurements. R.A. and Z.M. performed supporting experiments. R.A. and H.W. conducted simulations and data analysis. J.K.G. conducted the neural network simulations. H.D. and J.K. provided suggestions throughout the project. V.J.S. supervised the project. All authors discussed the results and commented on the manuscript.
The data that support the findings of this study are available within the article and its supplementary material.