A surface-normal photodetector as nonlinear activation function in diffractive optical neural networks

Optical neural networks (ONNs) enable high speed parallel and energy efficient processing compared to conventional digital electronic counterparts. However, realizing large scale systems is an open problem. Among various integrated and non-integrated ONNs, free-space diffractive ONNs benefit from a large number of pixels of spatial light modulators to realize millions of neurons. However, a significant fraction of computation time and energy is consumed by the nonlinear activation function that is typically implemented using a camera sensor. Here, we propose a novel surface-normal photodetector (SNPD) with a nonlinear response to replace the camera sensor that enables about three orders of magnitude faster (5.7 us response time) and more energy efficient (less than 10 nW/pixel) response. Direct efficient vertical optical coupling, polarization insensitivity, inherent nonlinearity with no control electronics, low optical power requirements, and the possibility of implementing large scale arrays make the SNPD a promising nonlinear activation function for diffractive ONNs. To show the applicability, successful classification simulation of MNIST and Fashion MNIST datasets using the measured response of SNPD with accuracy comparable to that of an ideal ReLU function are demonstrated.


Introduction
As artificial neural networks are more widely utilized in a variety of applications from pattern recognition [1,2] to medical diagnosis [3,4], there is an increasing need for faster and more energy efficient hardware platforms.Optical neural networks (ONNs) benefit from massive parallelism and different multiplexing schemes, such as wavelength, mode, time, and polarization, to enable processing with high energy efficiency at the speed of light [5].Hence, various ONN implementations have been demonstrated both using bench-top setups [6][7][8] as well as integrated platforms that enable smaller size and higher energy efficiency [9][10][11].
Despite the significant progress, scaling ONNs to thousands or millions of neurons and multiple layers to perform more complex tasks, is one of the main issues that integrated ONNs face [6].Complex and area-consuming photonic routing in commercially available platforms, larger on-chip propagation loss, and intricate electronic control circuitry to compensate for fabricationinduced errors, result in lower energy efficiency, packaging complexities, and impractically large integrated systems.
Free-space diffractive ONNs, on the other hand, enable orders of magnitude larger number of neurons compared to integrated ONNs, as well as more flexibility to implement different network configurations [6,7].Such systems are especially useful for image and video processing and classification as they directly process the input pictures or video frames with large number of pixels.Figure 1(a) shows the conceptual schematic of a feed-forward neural network with multiple layers of neurons, where each neuron performs linear (weight and sum) and nonlinear (activation function) computations on its inputs.Correspondingly, a diffractive ONN architecture that performs the linear and nonlinear computations is shown in Fig. 1(b).A laser source illuminates a digitally controlled micro-mirror device (DMD) that modulates the intensity of the incoming light with the input data to the network.A spatial light modulator (SLM) is used to implement linear weights.Large number of pixels of commercially available SLMs enable ONNs with millions of neurons per layer.The diffracted signals from the SLM are then directed towards the camera to apply the nonlinear activation function on the weighted-sum of the inputs.So far, the nonlinear activation function has been implemented either digitally after forming the image on a camera [7], or using the inherent nonlinear photoelectric response of the CMOS sensor [6].
In either case, the total computation time is mainly limited by the sensor exposure time which for commercial cameras is several milliseconds.For instance, in Ref [6], despite achieving an impressive performance of more than 200 tera operations per second (TOPS) and more than 1 TOPS/W, about 64% of the total processing time and 15% of total power consumption (about 6 W per pixel) are consumed by the sensor.Therefore, a faster and more energy efficient implementation of the nonlinear activation function can significantly improve the computation speed and energy efficiency of such systems.Note that in most diffractive ONNs only one neural layer is implemented using this setup and the full neural network is realized by re-using the same architecture but with different parameters.The output of the layer is always in electrical domain that drives the DMD after some processing.Therefore, an optical-in electrical-out (O-E) nonlinearity would best fit such systems.
Here we propose a novel implementation of the nonlinear activation function using a surfacenormal nonlinear photodetector (SNPD) to significantly improve the speed and energy efficiency of a diffractive ONN.The SNPD is formed by a vertical p-i-n structure contained in a Fabry-Perot cavity.These devices have been used previously as high-speed electro-optic modulators operating according to the quantum confined Stark effect [12][13][14][15][16].However, light coupled to these devices generates a photocurrent [14], and hence they can be used as photodetectors as well.Also, under high light intensity, nonlinearities induced by thermal effects arise.In this work, we use the nonlinear behavior of the SNPD photocurrent as a function of the incident optical power to realize a nonlinear activation function as an improved alternative to the camera sensor.The SNPD is a polarization-independent device and light can be vertically coupled to it with a high efficiency and without any additional coupling devices that ease its deployment within a free space ONN setup.
In this work, we show that a reverse-biased SNPD (i.e., each pixel) has a response time of about 5.7 s (3-dB bandwidth of 61 kHz) while consuming less than 10 nW of static power that make it about three orders of magnitude faster and more energy efficient than commercially available camera sensors.As a result, the activation function will not be a performance bottleneck of the system.As a proof of concept, the measured characteristics of SNPD is used in a neural network simulation platform to classify MNIST and Fashion MNIST datasets.In these tests, accuracies of 97% and 89% are achieved, respectively, showing a performance comparable with that of a standard rectified linear unit (ReLU) activation function.Note that the SNPD is primarily proposed to be utilized in a diffractive ONN setup as an O-E nonlinearity.Other solutions such as all-optical [17][18][19] and O-E-O [20] require additional coupling devices (e.g., grating couplers), polarization control, additional photodetectors to generate an electrical output, larger size per pixel, and control electronic circuitry to realize nonlinearity (especially in the case of O-E-O) that result in more complexity and less energy efficiency and make scaling more challenging.Although they enable faster response time than the SNPD, due the millisecond-scale response time of the SLM, the performance of the overall system will not improve and this only results in more energy consumption.Therefore, SNPD best fits a diffractive ONN setup.

SNPD structure and characterization
Figure 2(a) shows a sketch of the cross-section of the SNPD used in this work.It is composed of a multi-quantum-well (MQW) stack placed in the intrinsic region of a vertical p-i-n structure.The MQW is formed by 36 periods of In 0.53 Ga 0.47 As wells with 9 nm thickness and In 0.52 Al 0.48 As barriers with 4 nm thickness.The total thickness of the MQW is 468 nm, which is equivalent to one wavelength at about 1540 nm.The p-i-n stack is then inserted in an asymmetric Fabry-Perot resonant cavity with a high-reflectivity (HR) mirror on the bottom of the structure and a partial reflectivity top mirror formed by the semiconductor/air interface.Other MQWs with different composition (such as Si/SiGe) [14] and thickness [15] may be used as well in such a structure.The SNPD used in this work has active area diameter of 20 m.The top-view microphotograph of the device is shown in Fig. 2(b).The chip is bonded to a submount with single-ended ground-signal-ground metal pads that allow application of an electric field orthogonal to the layers of the MQW region.Note that devices with smaller active area can be designed in order to reduce the form factor when placed within an array [12].The details of the fabrication process are described in Ref [15].Typically, when used as a modulator, such a device operates according to the quantum confined Stark effect: upon application of a reverse bias voltage, the MQW absorption edge shifts in wavelength and produces amplitude modulation of the optical output signal.In this work, while we still apply a reverse bias voltage, we use it as a photodetector and work at wavelengths much longer than those typically used for modulation.
Figure 3(a) shows the experimental setup to characterize the SNPD in the linear and nonlinear regions.The output light of a tunable continuous wave (CW) laser is coupled orthogonally to the surface of the SNPD chip using a standard single mode fiber and a GRIN lens.The GRIN lens is used to reimage the optical mode of the standard fiber on the SNPD top surface with about 80% coupling efficiency while allowing to move the fiber farther away from the chip, but does not change the mode size.A fiber-optic based circulator allows to separate light at the input and output of the SNPD.Note that the reflected optical signal (P  in Fig. 3(a)) is used when the device operates in the modulator mode.As mentioned before, there is no need for any optical polarization control of the light as the device is fully polarization independent [12].Moreover, the SNPD is placed on a thermo-electric cooler (TEC) to stabilize the working temperature of the device.Due to a broad wavelength range of operation of the SNPD [12], no complex and power hungry closed-loop wavelength locking mechanism is necessary.To later characterize the nonlinear response time of the SNPD, an acousto-optic modulator (AOM) is driven with a 27 MHz CW signal by an arbitrary signal generator.In this mode of operation, the AOM only frequency shifts the laser with an insertion loss of about 3.5 dB.
In the first experiment, the responsivity of the SNPD in the linear region (i.e., low optical power) as a function wavelength and for different reverse bias conditions is measured.In this case, the AOM is bypassed and no amplitude modulation is performed.Figure 3(b) shows the responsivity of the SNPD as a function of optical wavelength for three different reverse bias voltages and a fixed on-chip optical power of -4.9 dBm (estimated after de-embedding the loss of other components).As the reverse bias value increases, the absorption edge red-shifts, resulting in higher a peak peak responsivity.To achieve a high photocurrent, a reverse bias voltage of 5 V is used in all of the following experiments.
In the second experiment and to study the nonlinear behavior of the SNPD as the input optical power changes, the responsivity of the device for a reverse bias voltage of 5 V and different input optical power values is measured.As shown in Fig. 3(c), for optical wavelengths shorter than 1580 nm, the responsivity graphs for different input optical powers are similar and no significant nonlinearity is observed.However, for longer wavelengths, as the input optical power increases, the difference between the responsivity graphs becomes more significant, showing the nonlinear behavior of the SNPD.This behavior is dominated by thermal effects [13] and once the optical power exceeds a certain threshold for a given wavelength, the generated photocurrent increases at a higher rate, resulting in a larger responsivity.
The third experiment is performed to characterize the nonlinear response of the SNPD that is to be used in an ONN.The laser wavelength is fixed to 1598 nm while the optical power is swept.As shown in Fig. 3(d), the photocurrent is a nonlinear function of the input optical power which resembles a ReLU function at 1598 nm.For optical power of larger than 1.25 mW, the change in the photocurrent significantly increases.The measured characteristic is later used in a neural network to confirm its applicability as a nonlinear activation function.Note that the threshold power is a function of the cavity design and biasing conditions and can be adjusted.Moreover, the optical wavelength of 1598 nm is chosen as it results in a close approximation of the ReLU Wavelength (nm) (e) Fig. 3. (a) Experimental setup used to characterize the device.(b) SNPD responsivity as a function of optical wavelength for different reverse bias voltages and an on-chip optical power of -4.9 dBm.In this case, the AOM is bypassed.(c) SNPD responsivity for different input optical powers showing a nonlinear behavior at wavelengths longer than 1580 nm.Here too, the AOM is bypassed.(d) SNPD photocurrent as a function of the input optical power (P  ) measured at the wavelength 1598 nm.The modulation signal is turned off in this measurement.(e) frequency response of the SNPD measured at the wavelength of 1598 nm, while the AOM modulates the input optical signal.nonlinear response.However, other wavelengths can be selected depending on the application and the desired type of nonlinear function.Since one neural layer is typically implemented using diffractive ONNs, the laser power can be set properly to maintain a sufficient optical power level at the SNPD to trigger the nonlinearity.
To measure the bandwidth of the SNPD in the proposed mode of operation, a train of square wave pulses is applied to the AOM to amplitude modulate the CW laser with an extinction ratio of greater than 35 dB.Note that the amplitude of the modulation signal is large enough to switch the input optical power between less than 1 W and a value larger than the threshold power which is about 1.25 mw.This way, we emulate a large change in the weighted-sum signal to find the worst case scenario for the response time.In this experiment, the modulation frequency is varied and the amplitude of the AC voltage is measured across a 50 Ω load on an oscilloscope.Figure 3(e) shows the normalized AC response of the SNPD (yellow squares) where fitting a single pole transfer function suggests a 3-dB bandwidth of 61 kHz, that is equivalent to a rise time (response time) of about 5.7 s.This is about three orders of magnitude faster than the typical millisecond response time of camera sensors.

Neural network simulation results
To demonstrate the applicability of the nonlinear response of the SNPD in a neural network, the measured transfer function of the device in Fig. 3(d) is used as the activation function in a neural network simulation platform to classify MNIST and Fashion MNIST datasets.Figure 4(a) shows the architecture of a simple neural network used in this work.The 28 × 28-pixel images are input to the convolution layer with 32 parallel 3 × 3 kernels with a stride (step size) of one and SNPD response as its activation function that replaces the standard ReLU function.A maxpooling layer down-samples the output of the convolution layer and is followed by a fully-connected layer with 100 neurons and SNPD response as the nonlinearity.Finally, 10 neurons with softmax activation generate the classification results of the network.The neural network is implemented using Tensorflow libraries.Stochastic gradient descent with a learning rate of 0.01 and momentum of 0.9 is used as the optimizer and with a categorical cross-entropy as the loss function.The input images are fed to the network with a batch size of 32.Moreover, random normal kernel initialization is used throughout the network.
Figures 4(b) and 4(c) show the training and test cross-entropy loss and classification accuracy, respectively, both as a function of the number of epochs.Using the measured SNPD nonlinear response, the network achieves a test classification accuracy of about 97%.As a reference, the same network with the standard ReLU function achieves the same accuracy.
In the second test, the same network is used to classify the Fashion MNIST dataset consisting of 28 × 28-pixel images of 10 different types of clothing (Fig. 4(a)).While the network architecture is the same, Adam optimizer is used instead of stochastic gradient descent for faster convergence.Moreover, He Uniform is used for kernel initialization.Figures 4(d) and 4(e) show the training and test loss and accuracy as a function of the number of epochs, respectively.An accuracy of about 88.5% is achieved while the same network with ReLU function achieves about 89%.Note that the lower accuracy compared to the MNIST classification case is due to more complex features in the Fashion MNIST dataset and can be improved by using a network that is better optimized for this application.

Discussion and summary
It should be noted that the proposed SNPD as the nonlinear activation function can be scaled to one-dimensional (1D) and two-dimensional (2D) arrays with large number of devices (pixels), similar to camera sensors.For instance, in Ref [16], a 288 × 132 array of similar devices is demonstrated.Therefore, high-resolution 2D array of nonlinear activation functions can be used in a diffractive ONN.
In summary, we demonstrated the applicability of a surface-normal nonlinear photodetector in free-space diffractive ONNs to realize the O-E neural nonlinearity as an alternative to the commonly used camera sensors.Significantly faster response time of 5.7 s removes the nonlinear activation function as a computation time bottleneck.The reverse biased SNPD consumes less than 10 nW of static power per pixel which in turn improves the overall energy efficiency of an ONN.Moreover, polarization-independent operation of the SNPD together with direct optical coupling and the possibility of implementing large-scale 1D and 2D arrays of the device, make it a promising candidate to be used in a free space ONN setup.

Backmatter
Disclosures.The authors declare no conflicts of interest.

Fig. 1 .
Fig. 1.(a) Typical feed-forward neural network architecture with multiple layers of interconnected neurons.The neural output is generated by passing the weighted-sum of the inputs through a nonlinear activation function.(b) Diffractive ONN architecture using a DMD to generate the input signals and SLM to apply corresponding weights to the inputs [6].Conventionally, a CMOS sensor acts as a detector and/or nonlinear activation function.

Fig. 2 .
Fig. 2. (a) Sketch of the cross-section of a SNPD.(b) Top-view photograph of a SNPD with 20 m active area diameter.

Fig. 4 .
Fig. 4. MNIST and Fashion MNIST data classification.(a) Architecture of the neural network used in this work.(b) Cross entropy loss and (c) classification accuracy as a function of number of epochs, showing the results both for training and test for the MNIST dataset.(d) Cross entropy loss and (e) classification accuracy as a function of number of epochs, showing the results both for training and test for Fashion MNIST dataset.