Optical neural networks (ONNs) enable high speed, parallel, and energy efficient processing compared to their conventional digital electronic counterparts. However, realizing large scale ONN systems is an open problem. Among various integrated and non-integrated ONNs, free-space diffractive ONNs benefit from a large number of pixels of spatial light modulators to realize millions of neurons. However, a significant fraction of computation time and energy is consumed by the nonlinear activation function that is typically implemented using a camera sensor. Here, we propose a novel surface-normal photodetector (SNPD) with an optical-in–electrical-out (O–E) nonlinear response to replace the camera sensor that enables about three orders of magnitude faster (5.7 µs response time) and more energy efficient (less than 10 nW/pixel) response. Direct efficient vertical optical coupling, polarization insensitivity, inherent nonlinearity with no control electronics, low optical power requirements, and the possibility of implementing large scale arrays make the SNPD a promising O–E nonlinear activation function for diffractive ONNs. To show the applicability of the proposed neural nonlinearity, successful classification simulations of the MNIST and Fashion MNIST datasets using the measured response of SNPD with accuracy comparable to that of an ideal ReLU function are demonstrated.
I. INTRODUCTION
As artificial neural networks are more widely utilized in a variety of applications, from pattern recognition1,2 to medical diagnosis,3,4 there is an increasing need for faster and more energy efficient hardware platforms. Optical neural networks (ONNs) benefit from massive parallelism and different multiplexing schemes, such as wavelength, mode, time, and polarization, to enable processing with high energy efficiency at the speed of light.5 Hence, various ONN implementations have been demonstrated using both bench-top setups6–8 and integrated platforms that enable a smaller size and higher energy efficiency.9–11
Despite the significant progress, scaling ONNs to thousands or millions of neurons and multiple layers to perform more complex tasks is one of the main issues that integrated ONNs face.6 Complex and area-consuming photonic routing in commercially available platforms, larger on-chip propagation losses, and intricate electronic control circuitry to compensate for fabrication-induced errors result in a lower energy efficiency, packaging complexities, and impractically large integrated systems.
Free-space diffractive ONNs, on the other hand, enable orders of magnitude larger numbers of neurons compared to integrated ONNs, as well as more flexibility to implement different network configurations.6,7 Such systems are especially useful for image and video processing and classification, as they directly process the input pictures or video frames with a large number of pixels. Figure 1(a) shows the conceptual schematic of a feed-forward neural network with multiple layers of neurons, where each neuron performs linear (weight and sum) and nonlinear (activation function) computations on its inputs. Correspondingly, a diffractive ONN architecture that performs linear and nonlinear computations is shown in Fig. 1(b). A laser source illuminates a digitally controlled micro-mirror device (DMD) that modulates the intensity of the incoming light with the input data to the network. A spatial light modulator (SLM) is used to implement the linear weights. A large number of pixels in commercially available SLMs enable ONNs with millions of neurons per layer. The diffracted signals from the SLM are then directed toward the camera to apply the nonlinear activation function to the weighted-sum of the inputs. So far, the nonlinear activation function has been implemented either digitally after forming the image on a camera7 or using the inherent nonlinear photoelectric response of the complementary metal-oxide semiconductor (CMOS) sensor.6 In either case, the total computation time is mainly limited by the sensor exposure time, which for commercial cameras is several milliseconds. For instance, in Ref. 6, despite achieving an impressive performance of more than 200 tera operations per second (TOPS) and more than 1 TOPS/W, about 64% of the total processing time (about 4 ms) and 15% of the total power consumption (about 6 μW per pixel) are consumed by the sensor. Therefore, a faster and more energy efficient implementation of the nonlinear activation function can significantly improve the computation speed and energy efficiency of such systems. This paper introduces a new device that significantly improves the performance of the nonlinear activation function, such that it is no longer a performance bottleneck for the system.
Note that in most diffractive ONN demonstrations described above, only one neural layer is implemented, and the full neural network is realized by reusing the same architecture but with different parameters. The output of the layer (i.e., the camera output) is always in the electrical domain that drives the DMD after some processing as the input to the next layer. Therefore, an optical-in–electrical-out (O–E) nonlinearity is required in such systems without the need for converting the nonlinearity output to the optical domain.
Here, we propose a novel implementation of the O–E nonlinear activation function using a surface-normal nonlinear photodetector (SNPD) to significantly improve the speed and energy efficiency of a diffractive ONN with O–E nonlinearity. The SNPD is formed by a vertical p-doped-intrinsic-n-doped (p–i–n) structure contained in a Fabry–Pérot cavity. These devices have previously been used as high-speed electro-optic modulators operating according to the quantum confined Stark effect.12–17 However, light coupled to these devices generates a photocurrent14 and, hence, they can be used as photodetectors as well. In addition, under certain light intensities, nonlinearity induced by thermal effects arises. In this work, we use the nonlinear behavior of the SNPD photocurrent as a function of the incident optical power to realize a nonlinear activation function as an improved alternative to the camera sensor. The SNPD is a polarization-independent device to which light can be vertically coupled with high efficiency and without any additional coupling devices that ease its deployment within a free space ONN setup.
In this work, we show that a reverse-biased SNPD (i.e., each pixel) has a response time of about 5.7 µs (3-dB bandwidth of 61 kHz) while consuming less than 10 nW of static power. These results make the SNPD about three orders of magnitude faster and more energy efficient than commercially available camera sensors. As a result, the activation function will not limit the performance of the system. As a proof of concept and to show the applicability of the SNPD nonlinear response in an ONN, the measured characteristics of the device are used in a neural network simulation platform to classify MNIST and Fashion MNIST datasets. In these tests, accuracies of 97% and 89% are achieved, respectively, showing a performance comparable with that of a standard rectified linear unit (ReLU) activation function.
Note that the SNPD is primarily proposed to be utilized in a diffractive ONN setup where an O–E nonlinearity is needed. Other solutions, such as all-optical8,18,19 and O–E–O,20 not only generate an optical output, which is not suitable in the case of a diffractive ONN, but they also require additional coupling devices (e.g., grating couplers), polarization control, additional photodetectors to generate an electrical output, a larger size per pixel, and control electronic circuitry to realize nonlinearity (especially in the case of O–E–O), which result in more complexity and less energy efficiency and make scaling more challenging. Although they enable faster response times than the SNPD, due to the millisecond-scale response time of the SLM, the performance of the overall system will not improve, and this only results in more energy consumption. Therefore, the SNPD best fits a diffractive ONN setup.
II. SNPD STRUCTURE AND CHARACTERIZATION
Figure 2(a) shows a sketch of the cross section of the SNPD used in this work. It is composed of a multi-quantum-well (MQW) stack placed in the intrinsic region of a vertical p-i-n structure. The MQW is formed by 36 periods of In0.53Ga0.47As wells with 9 nm thickness and In0.52Al0.48As barriers with 4 nm thickness. The total thickness of the MQW is 468 nm, which is equivalent to one wavelength at about 1540 nm. The p-i-n stack is then inserted in an asymmetric Fabry–Pérot resonant cavity with a high-reflectivity (HR) mirror on the bottom of the structure and a partial reflectivity top mirror formed by the semiconductor/air interface. Other MQWs with different compositions (such as Si/SiGe)14 and thicknesses15 may be used in such a structure. The SNPD used in this work has an active area diameter of 20 μm. The top-view microphotograph of the device is shown in Fig. 2(b). The chip is bonded to a submount with single-ended ground-signal-ground metal pads that allow the application of an electric field orthogonal to the layers of the MQW region. Note that devices with a smaller active area can be designed in order to reduce the form factor when placed within an array.12 The details of the fabrication process are described in Ref. 15.
Typically, when used as a modulator, such a device operates according to the quantum confined Stark effect: upon application of a reverse bias voltage, the MQW absorption edge, and hence the resonance, shifts in wavelength and produces amplitude modulation of the optical output signal. In this work, while we still apply a reverse bias voltage, we use the device as a photodetector and it works at wavelengths much longer than those typically used for modulation. It is worth mentioning that the shelf-life and material stability of our fabricated devices are similar to those of any conventional III–V compound devices, such as VCSELs, electro-optic modulators, or photodetectors.
The mechanism behind the nonlinear behavior of the device has been discussed in detail in some prior work.13,17 As will be shown later, generally, increasing the reverse bias voltage red-shifts the peak absorption wavelength. To observe the nonlinear behavior at a given reverse bias voltage the MQW is excited with a laser source with a wavelength higher than the initial peak absorption wavelength of the device. As the input optical power increases, more photo-carriers are generated that increase the photocurrent of the device. This increase in current heats up the device, and as the temperature increases, so does the absorption.17 This will further lead to an increase in photocurrent and, under the right conditions17 turn into a regenerative process. This regenerative process, which is thermally induced, is the main mechanism behind the nonlinear behavior and abrupt change in photocurrent as a function of input power. We have leveraged this behavior to approximate the widely used ReLU activation function. Note that the regenerative process does not necessarily require the MQW to be placed inside a cavity. However, the Fabry–Pérot cavity formed by the two mirrors helps with enhancing the nonlinear regenerative behavior of the MQW and observing the nonlinearity at lower input optical power levels. It is worth mentioning that the observed nonlinearity in the proposed SNPD is much stronger than a normal InGaAs p-i-n photodiode.21
Figure 3(a) shows the experimental setup to characterize the SNPD in the linear and nonlinear regions. The output light of a tunable continuous wave (CW) laser is coupled orthogonally to the surface of the SNPD chip using a standard single mode fiber and a GRIN lens. The GRIN lens is used to reimage the optical mode of the standard fiber on the SNPD top surface with about 80% coupling efficiency while allowing it to move the fiber farther away from the chip, but it does not change the mode size. A fiber-optic based circulator allows us to separate light at the input and output of the SNPD. Note that the reflected optical signal [Pout in Fig. 3(a)] is used when the device operates in modulator mode. As mentioned before, there is no need for any optical polarization control as the device is fully polarization independent.12 Moreover, the SNPD is placed on a thermoelectric cooler (TEC) to stabilize the working temperature of the device. Due to the broad wavelength range of operation of the SNPD12 no complex and power hungry closed-loop wavelength locking mechanism is necessary, resulting in reliable and stable performance during the measurements. To later characterize the nonlinear response time of the SNPD, an acousto-optic modulator (AOM) is driven with a 27 MHz CW signal by an arbitrary signal generator. In this mode of operation, the AOM only frequency shifts the laser with an insertion loss of about 3.5 dB.
In the first experiment, the responsivity of the SNPD in the linear region (i.e., low optical power) as a function of wavelength and for different reverse bias conditions is measured. In this case, the AOM is bypassed and no amplitude modulation is performed. Figure 3(b) shows the responsivity of the SNPD as a function of optical wavelength for three different reverse bias voltages and a fixed on-chip optical power of −4.9 dBm (estimated after de-embedding the loss of other components). As the reverse bias value increases, the absorption edge red-shifts, resulting in a higher peak responsivity. To achieve a high photocurrent, a reverse bias voltage of 5 V is used in all of the following experiments.
In the second experiment, to study the nonlinear behavior of the SNPD as the input optical power changes, the responsivity of the device for a reverse bias voltage of 5 V and different input optical power values are measured. As shown in Fig. 3(c), for optical wavelengths shorter than 1580 nm, the responsivity graphs for different input optical powers are similar, and no significant nonlinearity is observed. However, for longer wavelengths, as the input optical power increases, the difference between the responsivity graphs becomes more significant, showing the nonlinear behavior of the SNPD. As previously explained, this behavior is dominated by thermal effects, and once the optical power exceeds a certain threshold for a given wavelength, the generated photocurrent increases at a higher rate, resulting in a larger responsivity. Note that although the nonlinearity is mostly thermally induced, the performance of the device was stable during all of our measurements using a simple TEC.
The exact form of nonlinear behavior is a function of several factors, including wavelength offset from the peak absorption wavelength, reverse bias voltage, and MQW design. In other words, in order to optimize the shape of nonlinearity, one should properly set those parameters. The longer the wavelength (i.e., further from the peak absorption), the larger the input optical power required to start the regenerative process and enter the nonlinear region. In addition, for longer wavelengths, the sudden increase in photocurrent is more significant, which is not suitable for realizing a ReLU function. On the other hand, for shorter wavelengths [i.e., less than 1580 nm, as shown in Fig. 3(c)], the photocurrent becomes almost a linear function of the optical power, and no significant nonlinear response can be observed. This can also be observed in Fig. 3(c), where the change in responsivity as a function of input optical power is negligible for wavelengths shorter than 1580 nm (linear regime), which is consistent with the prior work.17 Hence, the optical wavelength of 1598 nm is chosen as it results in a close approximation of the ReLU nonlinear response. However, other wavelengths can be considered when designing the device, depending on the application and the desired form of nonlinear function.
The third experiment is performed to characterize the nonlinear response of the SNPD. The laser wavelength is fixed at 1598 nm, while the optical power is swept. As shown in Fig. 3(d), the photocurrent is a nonlinear function of the input optical power, which resembles a ReLU function at 1598 nm. For optical power greater than 1.25 mW, the change in the photocurrent significantly increases. The measured characteristic is later used in a neural network to confirm its applicability as a nonlinear activation function. In addition, since one neural layer is typically implemented using diffractive ONNs, the laser power can be set properly to maintain a sufficient optical power level at the SNPD to trigger the nonlinearity.
To measure the bandwidth of the SNPD in the proposed mode of operation, a train of square wave pulses is applied to the AOM to amplitude modulate the CW laser with an extinction ratio greater than 35 dB. Note that the amplitude of the modulation signal is large enough to switch the input optical power between less than 1 μW and a value larger than the threshold power, which is about 1.25 mW. This way, we emulate a large change in the weighted-sum signal to find the upper limit for the response time. In this experiment, the modulation frequency is varied, and the amplitude of the AC voltage is measured across a 50 Ω load on an oscilloscope. Figure 3(e) shows the normalized AC response of the SNPD (yellow squares), where fitting a single pole transfer function suggests a 3-dB bandwidth of 61 kHz, which is equivalent to a rise time (response time) of about 5.7 µs. This is about three orders of magnitude faster than the typical millisecond response time of camera sensors in typical diffractive ONN setups.
III. NEURAL NETWORK SIMULATION RESULTS
As mentioned before, the main focus of this paper is on the experimental demonstration of a new nonlinear activation function generation using the SNPD for diffractive ONN systems. Nevertheless, in order to show that the measured SNPD nonlinearity is applicable in a neural network, the transfer function in Fig. 3(d) is used in a neural network simulation platform to classify MNIST and Fashion MNIST datasets. Figure 4(a) shows the architecture of a simple neural network used in this work. The 28 × 28-pixel images are input to the convolution layer with 32 parallel 3 × 3 kernels with a stride (step size) of one and the SNPD response as its custom activation function that replaces the ideal ReLU function. A maxpooling layer down-samples the output of the convolution layer and is followed by a fully-connected layer with 100 neurons and SNPD response as the nonlinearity. Note that when defining different layers of the network using the Tensorflow platform, the type of activation function, which is normally an ideal function such as ReLU or Sigmoid, should be specified. In this work, instead of using an ideal function, we have incorporated a custom-defined activation function that is based on the experimental results shown in Fig. 3(d), both in the 2D convolution layer and the fully connected layer after it. Finally, ten neurons with softmax activation generate the classification results of the network. The neural network is implemented using Tensorflow libraries. Stochastic gradient descent with a learning rate of 0.01 and momentum of 0.9 is used as the optimizer, with a categorical cross-entropy as the loss function. The input images are fed to the network with a batch size of 32. Moreover, random normal kernel initialization is used throughout the network.
Figures 4(b) and 4(c) show the training and test cross-entropy loss and classification accuracy, respectively, both as a function of the number of epochs. Using the measured SNPD nonlinear response, the network achieves a test classification accuracy of about 97%. As a reference, the same network with the standard ReLU function achieves the same accuracy.
In the second test, the same network is used to classify the Fashion MNIST dataset, consisting of 28 × 28-pixel images of ten different types of clothing [Fig. 4(a)]. While the network architecture is the same, the Adam optimizer is used instead of stochastic gradient descent for faster convergence. Moreover, He Uniform is used for kernel initialization. Figures 4(d) and 4(e) show the training and test loss and accuracy as a function of the number of epochs, respectively. An accuracy of about 88.5% is achieved, while the same network with the ReLU function achieves about 89%. Note that the lower accuracy compared to the MNIST classification case is due to the more complex features in the Fashion MNIST dataset and can be improved by using a network that is better optimized for this application.
IV. DISCUSSION AND SUMMARY
It should be noted that the proposed SNPD as a nonlinear activation function can be scaled to one-dimensional (1D) and two-dimensional (2D) arrays with a large number of devices (pixels), similar to camera sensors. For instance, in Ref. 16, a 288 × 132 array of similar devices is demonstrated. Therefore, a high-resolution 2D array of nonlinear activation functions can be used in a diffractive ONN. Since the SNPD is illuminated from the top and the electrical connections are at the bottom of the chip [Fig. 2(a)], individual devices within a large scale 2D array can be accessed by designing an interposer chip that is bonded to the device array chip. The photocurrent associated with each SNPD can be directly digitized using an ultra-low power analog-to-digital converter with a microsecond-scale conversion time.22 This approach enables reading all pixels of a 2D array of SNPDs in a single shot, making the proposed neural activation function significantly faster and more efficient than a CMOS camera.
In summary, we demonstrated the applicability of a surface-normal nonlinear photodetector in free-space diffractive ONNs to realize the O–E neural nonlinearity as an alternative to the commonly used camera sensors. A significantly faster response time of 5.7 µs removes the nonlinear activation function as a computation time bottleneck. Hence, the total computation time will be mainly limited by the SLM update rate. The reverse biased SNPD consumes less than 10 nW of static power per pixel, which in turn improves the overall energy efficiency of an ONN. Moreover, the polarization-independent operation of the SNPD, together with direct optical coupling and the possibility of implementing large-scale 1D and 2D arrays of the device, make it a promising candidate to be used in a free space ONN setup. In that case, most of the diffractive ONN setup (including the laser source, DMD, SLM, and optical lenses and alignment devices) can remain the same while the CMOS sensor is replaced with an array of SNPD devices, which does not affect the overall complexity of the systems.
AUTHOR DECLARATIONS
Conflict of Interest
The authors have no conflicts to disclose.
Author Contributions
F.A. and M.H.I. contributed equally to this work.
F. Ashtiani: Conceptualization (equal); Data curation (equal); Software (lead); Validation (equal); Visualization (equal); Writing – original draft (equal). M. H. Idjadi: Conceptualization (equal); Data curation (equal); Software (supporting); Validation (equal); Visualization (lead); Writing – original draft (equal). T. C. Hu: Resources (supporting). S. Grillanda: Conceptualization (equal); Resources (equal); Validation (equal); Writing – original draft (supporting). D. Neilson: Resources (supporting). M. Earnshaw: Resources (supporting). M. Cappuzzo: Resources (supporting). R. Kopf: Resources (supporting). A. Tate: Resources (supporting). A. Blanco-Redondo: Conceptualization (supporting); Writing – original draft (supporting).
DATA AVAILABILITY
The data that support the findings of this study are available from the corresponding author upon reasonable request.