Reservoir computing (RC)-based neuromorphic applications exhibit extremely low power consumption, thus challenging the use of deep neural networks in terms of both consumption requirements and integration density. Under this perspective, this work focuses on the basic principles of RC systems. The ability of self-selective conductive-bridging random access memory devices to operate in two modes, namely, volatile and non-volatile, by regulating the applied voltage is first presented. We then investigate the relaxation time of these devices as a function of the applied amplitude and pulse duration, a critical step in determining the desired non-linearity by the reservoir. Moreover, we present an in-depth study of the impact of selecting the appropriate pulse-stream and its final effects on the total power consumption and recognition accuracy in a handwritten digit recognition application from the National Institute of Standards and Technology dataset. Finally, we conclude at the optimal pulse-stream of 3-bit, through the minimization of two cost criteria, with the total power remaining at 287 µW and simultaneously achieving 82.58% recognition accuracy upon the test set.
I. INTRODUCTION
Artificial intelligence (AI) has gained increasing interest due to its ability to manage data in efficient ways and make decisions on problems that even humans find difficult to carry out. This has led to the rapid development of smart devices and sensors, which are able to communicate with each other, connect, and participate in the internet of things (IoT).1 However, the above trends have generated the insatiable need for large data centers and fast processing speeds, which the traditional Von Neumann architecture is unable to offer.
Toward this direction, memristive devices proved to be a potential candidate for developing novel memory devices as the dominant representatives of non-volatile memories, flash memories, are reaching their scaling limits. More specifically, memristive devices can reduce their footprint even more when they are designed in a crossbar architecture (CBA), while they have a potential for 3D integration and low power consumption. Furthermore, the CBA configuration exhibits significant advantages for the implementation of artificial neural devices capable of replicating the computational capacity of biological synapses.2,3 The interest is due to the inadequacy of conventional systems to solve complex problems with relatively low power and great accuracy. Their potential ranges from Fourier transforms and Schrödinger equation solution to the development of complex artificial neural networks (ANNs) and dendrites.4–8
Among the various ANNs, recurrent neural networks (RNNs) dominate in terms of temporal processing of the input signals and therefore can export features with the corresponding sequential information. Reservoir computing (RC) belongs to the RNN family and is extremely intriguing. As depicted in Fig. 1, RC can transform input data into a higher spatiotemporal dimensional space via the non-linear reservoir block.9 The reservoir block utilizes short-term memory (STM) properties of the memristor without the requirement of training. This can be achieved either by applying voltage pulses or by exploiting recently reported optoelectronic synapses under optical stimulation.10,11 Finally, the reservoir’s responses lead to a CBA for the final readout. Only the synapses of the readout layer require training, which directly results in low power requirements. For this reason, RC systems were extensively investigated for the implementation of brain-inspired neuromorphic applications, such as handwritten and spoken digit classification and emotion recognition, as well as for chaotic system forecasting, solving second order non-linear dynamic tasks, and performing logic operations.12–21 The majority of these systems employ memristive elements with the dielectric material usually consisting of TaOx, TiOx, or their bilayer combination; WOx and even 2D materials have also been explored, while other self-selective devices have successfully been employed.14,16,17,22–26 It is also worth mentioning that recent research works have reported the successful development of vertical 3D Resistive Random Access Memory (RRAM) systems for the implementation of the reservoir, thus further reducing the power consumption, latency, and training cost and, at the same time, increasing the scaling factor.27,28 Nevertheless, most of these devices require relatively high operating voltages (>1 V) or are incompatible with the advanced complementary metal oxide semiconductor (CMOS) technology.29,30
Schematic illustration of an RC architecture. The reservoir can process the temporal input data in a non-linear way transforming them into a higher spatiotemporal dimensional space before passing them into the final readout layer.
Schematic illustration of an RC architecture. The reservoir can process the temporal input data in a non-linear way transforming them into a higher spatiotemporal dimensional space before passing them into the final readout layer.
The scope of this work is to thoroughly study the applied pulse-stream parameters of STM memories used in the reservoir. More specifically, the dependence of relaxation time on pulse width and pulse amplitude was examined. The developed conductive-bridging random access memory (CBRAM) memristive devices demonstrate self-rectifying abilities alongside intrinsic self-selectivity.31,32 These characteristics are enough to suppress sneak path currents (SPCs), which circulate uncontrollably through the entire structure and compromise the reading process.33 So far, these leakage currents have been dealt using transistors in series with memristive elements and additional read schemes.34,35 Self-selectivity has a double purpose, namely, first to replace the transistors, which undermine the power saving and area scaling, and second to enable operation of the devices in dual-mode, volatile (or threshold switching) for the implementation of STM reservoir blocks and non-volatile (or bipolar switching) for the readout layer. An in-depth analysis is then presented, regarding the impact of selecting the appropriate pulse-stream on a recognition application of handwritten digits, obtained from the National Institute of Standards and Technology (MNIST) dataset, in terms of accuracy and power consumption. Finally, yet importantly, the trade-off between accuracy and uprising power in the overall system is systematically examined, as well as the process of selecting the optimal pulse-stream.
II. EXPERIMENT
The aforementioned CBRAM devices were developed by RF magnetron sputtering at room temperature by using high-purity targets. First, the TiN (∼40 nm) bottom electrode (BE) was fabricated upon a SiO2 substrate (300 nm), followed by the development of SiO2 (∼20 nm) and Ag (∼40 nm). The final patterning of electrodes consisted of 100 µm and was carried out by lift-off technique.31 The characterization of the devices was performed with a Keithley 4200-SCS (Semiconductor Characterization System) semiconductor parameter analyzer by maintaining the BE in ground and applying regulating pulses in the Ag top electrode (TE). The endurance and the device-to-device and cycle-to-cycle variability of self-selective devices have previously been reported using 200 consecutive hysteresis loops where it was shown that they exhibit a linearity factor of 4.2 and 3.6 for the potentiation and depression processes, respectively.31,32
III. RESULTS AND DISCUSSION
Figure 2 represents the device’s performance under the application of direct current (DC) sweep bias. It is apparent that when the device operates in the threshold mode [Fig. 2(a)], the reset transition from the low resistance state (LRS) to the high resistance state (HRS) occurs before the applied voltage changes its polarity to negative. Contrariwise, by increasing the applied voltage from 0.26 to 0.36 V, the reset takes place after the polarity has changed and the device transitions from threshold to bipolar performance [Fig. 2(b)].
DC sweep performance of self-selective devices in (a) volatile threshold switching behavior and (b) non-volatile bipolar switching. The devices are able to transition from (a) to (b) by inducing the applied voltage. The black dotted lines are the results of the self-selective memristive model. The numbers and arrows highlight the switching direction.
DC sweep performance of self-selective devices in (a) volatile threshold switching behavior and (b) non-volatile bipolar switching. The devices are able to transition from (a) to (b) by inducing the applied voltage. The black dotted lines are the results of the self-selective memristive model. The numbers and arrows highlight the switching direction.
Here, the parameter ES is the activation energy of the thermo-diffusion physical mechanism, Rth is the thermal resistance, and φ and A are the top radius and area of the CF. ρon and ρoff correspond to the resistivity of the CF and SiO2, respectively (see the supplementary material, Table S2).
It has to be noted that this simulation model is capable of capturing the switching transition from threshold to the bipolar mode by increasing the applied voltage for the same set of parameters as experimental devices. The state variables h and r are responsible for the faithful simulation of self-selective devices (see the supplementary material, Fig. S2). In Fig. 2, it appears that the simulation model accurately follows the experimental curves, especially during the SET transition from the HRS to the LRS, and presents a self-compliance-current.
As was mentioned above, the reservoir block is responsible for the non-linear processing of the input data and the imminent mapping in a higher-dimensional feature space. This non-linearity is a result of the relaxation time of the volatile memristors that make it up. For this reason, in Fig. 3, the relaxation time of the STM devices is presented as a function of the amplitude and pulse duration of an applied voltage pulse.
Relaxation time dependence on the applied pulse with the voltage amplitude ranging from 0.15 to 0.35 V with 50 mV step and pulse width varying from 100 to 900 ns with 300 ns step and 100 ns delay.
Relaxation time dependence on the applied pulse with the voltage amplitude ranging from 0.15 to 0.35 V with 50 mV step and pulse width varying from 100 to 900 ns with 300 ns step and 100 ns delay.
It can be deduced that the relaxation time is highly dependent on the applied pulse parameters. Specifically, as the amplitude and the pulse’s duration time are rising, the device’s relaxation time to the high resistance state is delayed. If the values of these parameters increase further, the device will obtain long-term memory (LTM) properties, making it unusable for the reservoir block. It is also interesting to note that the quite short relaxation times are compatible with the manifestation of thermal-related effects for the self-dissolution of the percolating conductive filament (CF).31
In order to evaluate the RC system performance, the following neuromorphic application was designed and implemented by employing the self-selective model, including both operation modes (Fig. 4). More specifically, the threshold (volatile) mode of this model was used to simulate the reservoir block and the bipolar (non-volatile) was used to simulate the elements of the CBA. The application accepts as input the pixels’ intensity of binary images with sizes 21 × 20 pixels, containing the information of handwritten digits from the MNIST dataset [Fig. 4(a)]. Each image is reshaped, and every row of the new image is transformed into a pulse-stream according to the state of each pixel (black or white) in the row. The reshaping process depends on the chosen pulse-stream; thus, the reshaped image’s sizes are (420/pulse-stream) by (pulse-stream). This justifies the initial image dimensions chosen so that the pulse-stream varies from 2 to 6 bit. For example, in the case of a 5-bit pulse-stream, the reshaped image transforms into 84 × 5 pixels [Fig. 4(b)]. The transformed pulse-streams are applied to volatile memories of the reservoir block in one-by-one correspondence for the non-linear processing of the input data [Fig. 4(c)]. The applied pulse-streams to the reservoir block and the corresponding output applied to the word lines of the CBA are shown in Figs. 4(e) and 4(f) for the case where the pixel intensity is 00111. The responses of the reservoir are finally transferred to a non-volatile CBA architecture with dimensions (2 × 420/pulse-stream) × 10 [Fig. 4(d)]. The purpose of using the ×2 factor is to take into account negative voltages for receiving negative weights and performing the backpropagation procedure.
(a) Demonstration of the RC system for the supervised learning of handwritten digits 0–9 for the case of the 5-bit pulse-stream. (b) The input image is 21 × 20 pixels, and it is transformed to 84 × 5 pixels. (c) The reservoir block consists of 84 threshold resistive memories, (d) while the readout layer is a 168 × 10 CBA architecture with bipolar resistive elements. (e) Depiction of the applied pulse-streams to the reservoir block for the case where the pixel intensity is 00111. (f) The corresponding input in the word line of the CBA.
(a) Demonstration of the RC system for the supervised learning of handwritten digits 0–9 for the case of the 5-bit pulse-stream. (b) The input image is 21 × 20 pixels, and it is transformed to 84 × 5 pixels. (c) The reservoir block consists of 84 threshold resistive memories, (d) while the readout layer is a 168 × 10 CBA architecture with bipolar resistive elements. (e) Depiction of the applied pulse-streams to the reservoir block for the case where the pixel intensity is 00111. (f) The corresponding input in the word line of the CBA.
Consequently, based on this neuromorphic application, a detailed study was conducted on the impact of pulse-streams on the recognition accuracy and on the RC system power consumption as the impact of choosing the appropriate pulse-streams is absent and the choices are usually between 3 and 4 bit pulse streams.22,26,27,40–42 The cases of interest in this work are when the pixel intensity translates from 2 to 6 pairs of pulses. For this reason, the aftermath of the corresponding AC analysis is indicated in Fig. 5 for the case of 3 and 4 bit pulse streams (see the supplementary material, Fig. S3), where a square pulse series with 0.1 V pulse amplitude, 100 ns pulse duration, and 100 ns delay was applied (when the relaxation time of the STM is more imminent), for all possible cases (2pulse-stream) with bit 1 corresponding to the presence of a pulse and 0 to its absence. The choice of these parameters is supported from the results of Fig. 3 where the effect of pulse width and pulse amplitude on the relaxation time is shown. First, the relaxation time needs to be as short as possible in order to maintain STM properties, avoiding the transition to non-volatile mode while gaining the desired non-linear dynamics, which are significant for the final recognition accuracy. Second, it is necessary that the power consumption be minimal, leading to the choice of short pulse duration and pulse amplitude.
AC analysis of threshold switching devices in the reservoir for pulse-streams with 3 and 4 bit. The applied square pulse series has 0.1 V pulse amplitude, 100 ns pulse duration, and 100 ns delay. The experimental values are taken 100 ns after the pulse duration.
AC analysis of threshold switching devices in the reservoir for pulse-streams with 3 and 4 bit. The applied square pulse series has 0.1 V pulse amplitude, 100 ns pulse duration, and 100 ns delay. The experimental values are taken 100 ns after the pulse duration.
The results of the AC analysis indicate that the devices have taken different distinct levels of conductance. The purpose of the reservoir block is to utilize final values of the conductance of the AC analysis. The non-linear transformation into a high-dimensional feature space depends on the intermediate states (past input data), but only the final conductance states of the memristors need to be distinctive, which justifies the use of only volatile devices. This is very important as this conductance value will be exploited by the final readout layer and it would be desirable that each final conductance level corresponds to a different coding of the input image intensity. As the number of pulses increases and because the memories are volatile, early pulses of the pulse-stream will not affect final conductance states. Thus, final conductance levels will be comparable, and consequently, information will be lost.
Table I displays the results of the recognition application for different pulse-streams. As mentioned, the increase of a pulse-stream results in the convergence of different levels and its direct impact is seen in the reduction of the recognition accuracy. On the other hand, extending the number of pulse pairs requires fewer neurons, and consequently less power is consumed in the RC system. For comparison, it should be noted that a shallow deep neural network would require 8400 neurons, in the case that the pulse-stream is equal to 1, which would significantly delay the learning process, compared to the RC system. Therefore, a trade-off between accuracy and power consumption arises that should be managed based on the priorities of each application. For the final readout layer, which is the only layer that requires training, Bayesian probabilities were calculated using a softmax function applied to every bit line of the CBA for every sample n. Then, in order to perform the back propagation process, error functions were computed by comparing the probabilities with the sample label and the final weighted sum is computed to produce the desired weight update at the end of each batch (see the supplementary material, S4).6,36
Accuracy and power dependence on pulse-streams.
Pulse stream . | 2 . | 3 . | 4 . | 5 . | 6 . |
---|---|---|---|---|---|
Reservoir threshold memristors | 210 | 140 | 105 | 84 | 70 |
Crossbar synapses | 4200 | 2800 | 2100 | 1680 | 1400 |
Test accuracy (%) | 83.62 | 82.58 | 77.83 | 76.43 | 74.94 |
Total power consumption (μW) | 429 | 287 | 215 | 172 | 144 |
Pulse stream . | 2 . | 3 . | 4 . | 5 . | 6 . |
---|---|---|---|---|---|
Reservoir threshold memristors | 210 | 140 | 105 | 84 | 70 |
Crossbar synapses | 4200 | 2800 | 2100 | 1680 | 1400 |
Test accuracy (%) | 83.62 | 82.58 | 77.83 | 76.43 | 74.94 |
Total power consumption (μW) | 429 | 287 | 215 | 172 | 144 |
Additionally, the optimal pulse-stream was sought out in order to minimize two different criteria. The first obvious criterion that needs to be satisfied is the lowest total power consumption and, at the same time, the minimization of the test accuracy loss. The latter is equivalent to the requirement to maximize the recognition accuracy on the test set. Obviously, the degree of importance of the two criteria depends on the respective application. Nevertheless, in this particular application, we consider that the two criteria are equally important, and therefore, we have normalized them to the same range of values, as shown in Fig. 6. Then, the fitting of the equations of the two criteria has been carried out, and more specifically, we have calculated the linear fitting of the normalized test accuracy loss ftal and the third order polynomial fitting for the normalized total power consumption ftpc through the curve fitting application of the Matlab software. By adding the two fitting curves, we can conclude on the local minimum of the isobar minimization criterion, which appears for the case where the bit stream is 3 (the local minimum is actually presented at the value of 2.78).
Curve fitting of the two normalized minimization criteria of test accuracy loss and total power consumption for the determination of the optimal pulse-stream.
Curve fitting of the two normalized minimization criteria of test accuracy loss and total power consumption for the determination of the optimal pulse-stream.
Taking the above into account, the optimal pulse-stream, which is based on the maximum accuracy and the simultaneous requirement for the smallest possible power supply in the total RC system, appears to be for a pulse-stream equal to 3. Even if the maximum accuracy is achieved by a pulse-stream equal to 2, for pulse-stream equal to 3, there is only a 1.24% accuracy relative reduction, while, on the other hand, power consumption is decreased by about half. For larger pulse-streams, the accuracy reduction is more significant. The confusion matrix for the optimal pulse-stream is reported in Fig. 7.
The RC system was able to successfully classify the handwritten digits from ten different classes 0–9 and has an 82.58% recognition accuracy over the test set. Training required a total of ten epochs, while 50 000 training images were organized into 10 000 training batches per epoch.
IV. CONCLUSION
In summary, self-selective and low-power consumption devices have been developed, which behave as either threshold (volatile) or bipolar (non-volatile) resistive memories according to their polarization. By using the presented system architecture, the required power is significantly reduced as transistor switches are not required to suppress the SPCs, while the complexity of the RC system is reduced. Furthermore, based on the self-selective simulation model, which can reproduce both volatile and non-volatile switching behaviors, a detailed investigation of the pulse-stream scheme was performed for a typically handwritten digit recognition application. Along these lines, this work provides a clear insight regarding the dependence of recognition accuracy and of total power consumption on the selected pulse-stream. Finally, the optimal pulse-stream was sought in order to satisfy the need for minimum power consumption and at the same time to maximize the recognition accuracy. The optimal pulse-stream was found to be the 3-bit one, which presents an 82.58% recognition accuracy over the test set, while having a total power consumption of 287 µW.
SUPPLEMENTARY MATERIAL
See the supplementary material for the following: additional information on the memristor’s simulation model pseudocode in Table S1 and its parameters in Table S2; furthermore, the minimization of the cost function between the simulation and the experimental data (Fig. S1) and the evolution of the vertical and lateral state variables for the threshold (volatile) and the bipolar (non-volatile) mode, respectively (S2 and S3); the AC analysis of the memory devices when operated in the threshold switching mode for pulse-streams with 5 and 6 bit, respectively (Figs. S4 and S5); and the stochastic gradient descent in the back propagation process (Sec. S4).
ACKNOWLEDGMENTS
The work of P. Bousoulas and D. Tsoukalas was supported by the Hellenic Foundation for Research and Innovation (H.F.R.I.) under the “First Call for H.F.R.I. Research Projects to support Faculty members and Researchers and the procurement of high-cost research equipment grant” (Project No. 3830).
AUTHOR DECLARATIONS
Conflict of Interest
The authors have no conflicts to disclose.
Author Contributions
C. Tsioustas: Conceptualization (equal); Data curation (equal); Investigation (equal); Methodology (equal); Project administration (equal); Software (equal); Validation (equal); Visualization (equal); Writing – original draft (equal). P. Bousoulas: Conceptualization (equal); Methodology (equal); Supervision (equal); Visualization (equal); Writing – original draft (equal). G. Kleitsiotis: Data curation (equal); Visualization (equal); Writing – original draft (equal). D. Tsoukalas: Conceptualization (equal); Project administration (equal); Supervision (equal); Writing – original draft (equal).
DATA AVAILABILITY
The data that support the findings of this study are openly available in Reservoir-Computing at https://github.com/CTsioustas/Reservoir-Computing.43