Memristors are widely considered as promising elements for the efficient implementation of synaptic weights in artificial neural networks (ANNs) since they are resistors that keep memory of their previous conductive state. Whereas demonstrations of simple neural networks (e.g., a single-layer perceptron) based on memristors already exist, the implementation of more complicated networks is more challenging and has yet to be reported. In this study, we demonstrate linearly nonseparable combinational logic classification (XOR logic task) using a network implemented with CMOS-based neurons and organic memrisitive devices that constitutes the first step toward the realization of a double layer perceptron. We also show numerically the ability of such network to solve a principally analogue task which cannot be realized by digital devices. The obtained results prove the possibility to create a multilayer ANN based on memristive devices that paves the way for designing a more complex network such as the double layer perceptron.

## I. INTRODUCTION

The development and hardware realization of artificial neural networks that are capable of learning information processing (pattern recognition and classification, approximation, prediction, etc.) remains one of the most challenging tasks in artificial intelligence. One of the main issues in this pursuit is the lack of suitable hardware for the implementation of key elements of a typical ANN – neurons and synapses. While the CMOS based neurons are nowadays commercially available,^{1} the appropriate candidate for the synapse is still under discussion. There are two main possible ways of synapse realization: a digital one (e.g., as the Static Random Access Memory^{2} or floating gate transistor^{3}) and an analogue one (memristive device).^{4} The main advantage of the first one is its full integration with the standard CMOS technology. However this approach suffers from i) digital versus analogue representation of synaptic weights reflecting the lower performance of ANNs’ super-parallel computations; ii) mediocre energy efficiency, if compared to memristive systems and to their biological counterparts, iii) the chip has a lower potential density than in case of memristors use. In this context, memristive devices are very promising candidates.^{5} Basically, a memristive device is a two-terminal device, whose conductivity may be changed almost continuously by applying a relatively large voltage bias, but is retained constant when a smaller bias or no bias is applied.^{6} Memristive properties were found in inorganic (such as TiO_{x}, HfO_{x}, SiO_{x}, etc.),^{7–10} organic (polyaniline, polyimide)^{11,12} and hybrid organic/inorganic^{13} materials. Organic or polymeric materials have unique advantages over traditional inorganic memristive devices, including high flexibility for biocompatible neuromorphic circuits and implants, low cost, solution processability, large-area implementation. An important advantage is also the possibility to realize the polymeric stochastic memristive systems in which communication between the computing elements (neurons) can be arranged in 3D.^{14} Regarding neural networks, where effective learning requires a precise knowledge of the conductivity state of all elements and kinetics of its variation, polyaniline based system has another very important advantage. Conductivity of polyaniline, and, therefore memristive elements, is directly connected to its color.^{15} Thus, it gives the possibility to measure conductivity of each element with a contactless spectrophotometric method, what will allow simplifying the circuit.

Dealing with the hardware realization of a simple ANN, based on memristive devices, few are proposed in literature.^{16–19} The single-layer (or elementary) perceptron is the simplest kind of neural network which can implement basic learning and parallel processing. However, to the best of our knowledge, there is no successful attempt of multilayer perceptron hardware realization on memristive devices. Nevertheless, in the field of artificial intelligence, more complex neural networks are requested to solve demanding tasks.^{20} A multilayer perceptron can perform linearly nonseparable tasks (i.e. the tasks that cannot be separated by an hyper-plane in the space of their features, which is also an input space of the perceptron^{21}), that cannot be solved by a single-layer perceptron.

Thus, the main goal of the present work is the hardware realization of a simple double-layer ANN based on organic (polyaniline) memristive devices able to solve linearly nonseparable tasks. In this manuscript we present the first steps towards the realization of the double layer perceptron, including the design and hardware realization of the ANN. The implementation of the backpropagation algorithm and its use to train the device will be the subject of a subsequent work. Here, we designed an ANN and showed the first results of its capability in performing the XOR logic task. Moreover, we show numerically that our setup is capable to solve an analogue task particularly demanding for the standard von Neumann architectures. The obtained results, although still preliminary, are highly encouraging and suggest a new route for the implementation of multilayer ANN based on memristive devices.

## II. METHODS

PANI based memristive devices were fabricated following the technique reported in Ref. 22. A solution of PANI (Mw≈100 000, Sigma Aldrich) was prepared with a concentration of 0.1 mg·mL^{−1} in 1-methyl-2-pyrrolidinone (Sigma Aldrich ACS reagent $\u2a7e$99.0%) with the addition of 10% of Toluene (AnalaR NORMAPUR® ACS). This solution was filtered twice and then deposited onto a glass substrate (1.5x0.5 cm^{2}) with two Cr electrodes by Langmuir–Schaefer technique. The PANI conductive channel was formed by depositing 60 layers of polymer in its emeraldine base form and then transforming it in the emeraldine salt conducting form by a doping process consisting in the immersion in HCl (1M). Subsequently, a stripe of solid electrolyte, of about 1 mm width, was deposited in the center of the PANI channel in a crossed configuration and a silver wire (0.05 mm), inserted in the polyelectrolyte, worked as a reference electrode. The electrolyte was prepared starting from a water solution (20 mg·ml^{−1}) of polyethylene oxide (PEO) with a molecular weight of 8·10^{6} Da in which a solution of LiClO_{4} (Sigma) and water were added to reach the concentration of 0.1M. The final structure was additionally doped in HCl vapor. The voltage cycles application and the current measurements were performed by means of a NI PXIe-4130, PXIe-4138, PXIe-4139 Source Measure units, NI DAQ board and two bias voltage suppliers (0.4 and 15 V). All source and measurement elements were controlled by a dedicated LabVIEW program.

## III. RESULTS AND DISCUSSION

### A. ANN construction

The principal scheme of the network, as shown in Fig. 1a, consisted of two inputs (X_{1}, X_{2}), two neurons (several in general case) on the hidden layer and an output neuron (or several neurons). Inputs and neurons were connected by links with specific synaptic weights (*w*_{ij}, *w*_{jk}). The circuit diagram of the network based on memristive devices is presented in Fig. 1b (color parts coincides with those in Fig. 1a). Each weight was represented by two memristive devices (see below). Vital requirement for training the network is the ability to change the resistance (proportional to the synaptic weight) of every memristive device independently from others. To manage this issue we developed an access system based on CMOS-transistors as the voltage-controlled switches. This system allowed to apply a writing voltage to the specified memristive device within a training procedure or to read the voltage during information processing. Such a switch connects each memristor either to one of the inputs when being biased by some non-negative voltage or to the reference voltage source (+0.2 V) (for motivation see below). A commutator composed of one 1-in-8 analogue switch (considered as a “master”) and two more (“slave”) connected in series allowed us to control all 12 switches in the circuit by the five logic inputs (Fig. 1c).

The artificial neuron body (soma) was implemented in the circuit by an op-amp based differential adder and a voltage divider with a MOSFET controlled by the output of the summator. This element executed the basic neuron functions in terms of information processing – summation and threshold. The differential summator performing $y=\u2211wixi$ function is required to separate different classes of input combinations, where *y* is the output voltage of the summator, *x*_{i}, *w*_{i} – the *i*-th input voltage and the corresponding weight respectively. Moreover, such a scheme allows the realization of negative synaptic weights by doubling the number of memristors which is crucial for the learning algorithm convergence in almost all possible tasks. In this scheme, each synapse was represented by two memristive devices, “excitatory” and “inhibitory”, connected to non-inverting and inverting inputs of the op-amp accordingly. The resulting weight of the *i*-th synapse was $wi=Rfb(Gi+\u2212Gi\u2212),$ where *R*_{fb} is the value of the feedback resistance, $Gi+$ and $Gi\u2212$ the conductances of the *i*-th excitatory and inhibitory memristive devices respectively. The output voltage *y* was applied to the gate of the MOSFET in the voltage divider connecting the neuron output to the logic “1” when open and to the logic “0” in the opposite case. The threshold voltage of the voltage divider was about 1.8 V, depending on the characteristics of the MOSFET used. Typical transfer function (which in terms of ANNs is called an activation function) is shown in Fig. 2a.

### B. Memristive device behavior

The initial characterization of the memristive devices was developed by measuring cyclic I-V curves. The measurement scheme is described in detail elsewhere.^{23} Typical I-V characteristics for electronic and ionic currents are shown in Fig. 2b. There are two peaks in the I-V ionic curve at about 0.1 V and 0.5V (inset in Fig. 2b), corresponding to the potentials of redox reactions that the PANI undergoes. The ionic current passing through the PANI/PEO interface is due to the variation of redox state that changes the conductivity of PANI. Thus, adjusting the potential value it is possible to control the rate of PANI conductivity change. The electronic current shows a nonlinear rectifying behavior (Fig. 2b). The electronic current presents a slight increment before 0.5 V applied, while, at about 0.7 V, the current increases markedly, because of the oxidation process.^{24} During the backward voltage sweep, the reduction process results in the conductivity decrease. According to this, the voltage of 0.4 V was used for reading the output values and memristive device’s conductance to prevent their noticeable variations. This value was also determined as logic “1”. The voltage of 0 V was considered as logic “0”. When no input vector was applied to the network, each memristive device was biased to +0.2 V, as it approximately corresponds to the redox equilibrium potential of PANI. For the learning procedure, the amplitude of potentiation voltage pulse was chosen to be +0.6 V, while that of depression to -0.2 V.

The training pulses durations were established on the base of PANI memristive device resistive switching kinetics. Typical plots are shown in Fig. 2c. Absolute values of the conductance change under potentiating voltage pulse (+0.6 V) and depressing one (-0.2 V) are presented in Fig. 2d, as functions of the initial conductance for various pulse durations. Each value was obtained by applying voltage during 10, 20 and 40 s for depression and 100, 200 and 400 s for potentiation and measuring current through the device within 1 s. The figure shows that the memristive device conductance could be changed almost continuously from 10^{-7} to 10^{-5} S. Additional analysis demonstrated that conductance, under potentiating voltage, could be well approximated by a function $A0+A1exp(\u2212t/\tau 1)+A2exp(\u2212t/\tau 2)$, while that under depression by a function $A3+A4exp(\u2212t/\tau 3)$. Characteristic time values $\tau 1$*, $\tau 2$* and $\tau 3$ varied from sample to sample, but their averages were 400 s, 40 s and 50 s respectively. It should be noted also that endurance characteristics of each memristive device strongly depend on the state of its solid electrolyte: when it dries out the device loses its memristive properties and becomes a simple resistor. In order to extend the working time of the device we covered it with a polyimide kapton tape. The retention time of memristive devices at +0.2 V (PANI redox potential) was not very long (about a day) but it was enough for the demonstration purposes of our work. It could be increased by, for example, inserting of metal nanoparticles inside the PANI layer as it potentially can preserve the charge for conserving the current electrochemical state and thus conductance of the memristive device.^{25}

### C. Nonseparable task solving

Since a double-layer perceptron is able to solve linearly nonseparable task, we chose the “XOR” function to be performed by our network. It is the logic task, where (0;0) and (1;1) input signals belong to the class “0” and the other two (1;0) and (0;1) to the class “1” (according to the logic outputs), leading to the lack of a single straight line in the feature plane separating these classes. This task cannot be solved by elementary (single-layer) perceptron, where each output neuron implements one hyper-plane separating the classes. Nevertheless, the second layer neurons in a double-layer ANN perform the separation in a feature space of the first layer, enabling union, intersection and difference of the “subclasses” highlighted by the hidden layer of the network.

In machine learning, the back propagation with batch correction learning algorithm^{26} is widely used for nonseparable task solving. Shortly, the algorithm comprises the calculation of the gradient of a squared error function with respect to all the weights in the network. The gradient is fed to the optimization method which in turn uses it to update the weights, in an attempt to minimize the squared error function. It means that one has to tune the weight values very precisely. This point was an issue for the hardware perceptron due to the fact that resistive switching kinetics of memristive devices were not similar enough for unified mathematical model. So we could only follow the weight correction direction (sign), but not its value, choosing the empirically established learning pulse time duration. Such modification of the back propagation learning algorithm leads to the strong correlation of the necessary number of steps to converge with the initial weights distribution: closer it was to the final distribution, the less number of steps was needed. It is to note that even not every initial state of the network led to the convergence. Possible solutions of the issue could be an implementation of different algorithms based on spike timing dependent plasticity (STDP) rules^{27} or realization the circuit where conductivity of each element would be measured with a contactless spectrophotometric method.^{15}

Each step of our learning procedure consisted consecutively of an application of the whole training set of vectors *x*^{(k)} (*k* = 1, 2, 3, 4), actual weight measuring (applying the “reading” pulses) and weight correction (applying the “writing” pulses). The correction pulse duration values were chosen in such a way as to minimize the duration of learning steps, and it was kept constant (but different for depressing and potentiating pulses) for all steps in the whole learning procedure. The procedure was performed until convergence. Fig. 3a shows results of the learning procedure for XOR logic function at the first and last iterations (the whole procedure consisted of two steps as in the example). Fig. 3b depicts the weight values change after learning. As described above, each weight was adjusted by two memristive devices (their conductances are not shown separately) and set in arbitrary units. As shown in Fig 3c, the weights were adjusted so that two output classes were separated by two planes in the feature space.

### D. Analogue task solving

Since the double layer perceptron separates the feature space into different classes by hyper-planes and their further combination, one class represents the multidimensional polygon-like area in the feature space. This form allows the perceptron to classify not only “black” (logic “0”) and “white” (“1”) classes, but also “gray” ones (some range of signal amplitude between logic “0” and “1”), i.e. the analogue input signals. Here, we show a basic opportunity to solve an analogue task by means of our circuit on an example of the simplest polygon: the triangle. As every straight line was performed by one neuron in the hidden layer, we used a circuit consisting of two inputs and three neurons on the first layer and one output neuron on the second layer. The circuit was simulated using real characteristics (memristive device kinetics, neuron activation functions, resistors and other elements shown in Fig. 1b). Used for the *i*-th neuron activation function was obtained by fitting experimental data shown in Fig. 1a by sigmoidal function $yi=11+e\Sigma i\u22124.50.5$, where $\Sigma i$ is the weighted sum of the inputs of the *i*-th neuron, considering +0.4 V as logic “1”. Learning was performed following back propagation learning algorithm described in Ref. 24, simplified by replacing the derivative of the activation function by a constant 0.5 to speed up the convergence. Optimal learning rate constant $\eta $ was found to be equal to 2 for used initial weights uniformly distributed on the interval.^{2,8} The possible position of separating lines (in bold red), implemented by the hidden neurons, and the calculated output signal (heat map) are shown in Fig. 4a. Vector points of the training set (white squares in Fig. 4a) were chosen for learning the perceptron to classify the analogue signal approximately in the geometry of triangle, with enough margins between points to avoid a possible uncertainty of classification, associated to the activation function width. The points inside the triangle were defined as corresponding to the class “1”, while the others to the class “0”. The learning procedure can be seen as the value dependence of the squared error function *E* on the epoch number for different initial conditions (Fig. 4b). The error convergence to the value of zero means that the double layer perceptron could be learned to solve an analogue classification task for different sets of initial weights.

## IV. CONCLUSIONS

In conclusion, we have shown that memristive devices can be used in principle for multilayer ANN hardware realization. For the first time, we built a double-layer ANN network that paves the way for the realization of a multi-layer perceptron, demonstrating the possibility to perform nonseparable combinational logic classification (XOR logic task). It was also proved that a perceptron principally can solve analogue tasks which cannot be realized by digital devices. This approach could be extended (but not directly) to larger ANNs and other machine learning algorithms for more complex and data-intensive tasks.

## ACKNOWLEDGMENTS

The work was partially supported by the Russian Science Foundation (16-13-00052) and was partially done on the equipment of the Resource center of electrophysical methods (Complex of NBICS-technologies of Kurchatov Institute). This research was performed in the framework of the “Grandi Progetti 2012” funded by Autonomous Province of Trento, Italy (PAT): “Developing and studying novel intelligent nano materials and devices towards adaptive electronics and neuroscience applications — MaDEleNA Project”.