The ways to creating practically effective method for spiking neuron networks learning, that would be appropriate for implementing in neuromorphic hardware and at the same time based on the biologically plausible plasticity rules, namely, on STDP, are discussed. The influence of the amount of correlation between input and output spike trains on the learnability by different STDP rules is evaluated. A usability of alternative combined learning schemes, involving artificial and spiking neuron models is demonstrated on the iris benchmark task and on the practical task of gender recognition.

## I. INTRODUCTION

Applying spiking neural networks to the classification task is currently relevant from two points of view. Firstly, a practical supervised learning algorithm of spiking neural networks gives the way for implementing in autonomous neuromorphic hardware with ultra-low power consumption. Secondly, the creation of such algorithm based on biologically plausible plasticity rules may also explain the role of long-term plasticity in the brain.

Addressing the question of creating practically effective supervised learning methods for spiking neuron networks, there is a number of works devoted to it (Gütig and Sompolinsky, 2006; Mitra *et al.*, 2009; and Franosch *et al.*, 2013), but still no such method has been created based only on the current knowledge of biological neural systems operating rules, namely, on spike-timing-dependent plasticity (STDP), a biologically inspired long-term plasticity model. In Section II we discuss, under what conditions the weights with which a neuron performs the desired input-output transformation can in principle be stably reached as the result of STDP. In Section II B 2 we demonstrate that the steady value of a weight is determined by the amount of correlation between the output spike train and the corresponding input spike train. Based on this fact, in Section II C we propose a supervised learning algorithm and show its capability to solve a linear classification task.

There is also a straightforward approach, to map the trained formal network onto the spiking one. In (Eliasmith, 2013) each formal neuron is replaced with several spiking ones, that, along with the encoding and decoding machinery, reproduce its activation function. Furthermore, one can simply transfer synaptic weights obtained by training a formal network to a spiking network of same topology (Diehl *et al.*, 2015). In Section III B we show on the Fisher’s Iris benchmark that the spiking network can give an increase in classification accuracy compared to the formal one. In Section III C we apply this approach to the task of recognizing gender of a text author. This task is of great importance in safety and guard systems, and social network analysis, as a component of authorship profiling, i. e. extraction of information about the unknown author of text (demographics, psychological traits, etc.), based on the analysis of linguistic parameters.

## II. STDP-BASED APPROACH

### A. Materials and methods

In the Spike-Timing-Dependent Plasticity model (Morrison *et al.*, 2008), the strength of a synapse is characterized by weight $0\u2264w\u2264wmax$, whose change depends on the difference between presynaptic *t*_{pre} and postsynaptic *t*_{post} spike moments:

where *W*_{+} = 0.03, $W=1.035\u22c5W+$, $\tau +=\tau \u2212=\tau =20\u2009ms$. The rule with $\mu +=\mu \u2212=0$ is called additive STDP, with $\mu +=\mu \u2212=1$ – multiplicative, intermediate values $0\u2264\mu \u22641$ are also possible.

In case of additive STDP the auxiliary clauses are added to prevent the weight from falling below zero or exceeding the maximum value *w*_{max}:if $w+\Delta w>wmax$, $w\u2192wmax$; if $w+\Delta w<0$, $w\u21920$.

An important part of STDP rule is the scheme of pairing pre-and postsynaptic spikes when evaluating weight change according to the rule (1). Besides the all-to-all scheme, there exist several nearest-neighbour ones (Morrison *et al.*, 2008). We used the restricted symmetric scheme (fig. 1), in which a presynaptic spike is paired with the last preceding postsynaptic, if it has not yet been accounted in a pre-after-post pair, and vice versa: a postsynaptic spike is paired with the nearest preceding presynaptic if the latter has not yet participated in any other post-after-pre pair.

As the neuron model we used Leaky Integrate-and-Fire, in which the membrane potential *V* obeys

when $V\u2265Vth$, $V\u2192Vreset$, and during the refractory period $\tau ref=3$ ms the neuron is insensitive to the synaptic input. Membrane capacity $Cm=300$ pF, membrane time constant $\tau m=10$ ms. The postsynaptic current is of exponential form: a presynaptic spike from synapse *i* at time $tsp$ adds $wi(tsp)qsyn\tau syne\u2212t\u2212tsp\tau syn\theta (t\u2212tsp)$ to $Isyn$, where $qsyn=0.03$ pC, $\tau syn=5$ ms, *w*_{i} is the synaptic weight and $\Theta (t)$ is the Heaviside step function.

### B. The principal possibility of weight to converge to the target on base of input-output correlation

Provided that the desired synaptic weights are known in advance, the question is whether such weights can emerge as the result of applying STDP to input spike trains and some output spike train. Following (Legenstein *et al.*, 2005), we investigated the ability of weights to converge to the target under the following protocol:

The output train of the neuron with target weights and without STDP is recorded. It is then considered as the desired output.

STDP is turned on, and the neuron, receiving the same input trains, is forced to fire spikes in desired moments. This is expected to make the weights converge to the target, starting from arbitrary (but low) initial ones.

From our preliminary works the following is known: The weights can converge to the target not with all spike pairing schemes. The restricted symmetric scheme showed the best convergence, so it is the one we use.

The existence of short-term plasticity in the synapse model does not affect the weights convergence. The neuron model also does not matter: we checked Leaky Integrate-and-Fire, Hodgkin and Huxley (1952) and Izhikevich (2003) models and a static adder, in which an incoming spike just adds its synapse’ weight to the membrane potential, and when the accumulated value reaches threshold, it is dropped to zero and the neuron fires a spike.

The convergence persists if mean frequencies of input spike trains are changed during the simulation, but slightly declines in case different inputs receive trains of different mean frequencies. (Sboev *et al.*, 2016a).

Finally, the considered protocol of forcing the output leads to weights convergence to target in case a set of binary vectors is used as input, and weights with which the neuron divides the set into two classes are used as target (Sboev *et al.*, 2016b). Though this result does not provide the learning mechanism to obtain weights needed for a classification task, it shows that such weights can be stably reached as the result of STDP.

#### 1. Weights convergence in case of non-additive STDP

In case of additive STDP only 0 and $wmax$ are stable points of weight. Using non-additive STDP allows more wide range of weight distributions to be reached with the protocol under consideration. Fig. 2 shows two examples of target weight distributions that can be reached with the parameters that we found, $\mu +=0.06$ and $\mu \u2212=0.01$.

#### 2. Correlative nature of STDP

We now demonstrate that the steady value of a weight is determined by the amount of correlation between input and output spike trains.

##### a. The correlation measure

The normed cross-correlation function is defined as

where $Spre/post(t)$ indicates a pre/postsynaptic spike respectively at time *t*, and $tbin$ is the simulation step,

can be used as a rough correlation indicator, where $\tau $ is the STDP time window constant. A similar estimate is often used in analytical studies such as (Rossum *et al.*, 2000).

##### b. Results

We here artificially generated, based on the technique from (Gütig *et al.*, 2003), input and output trains with different values of correlation. Applying STDP to these trains leads to weights convergence, and the resulting weight value depends monotonously on the input-output correlation (“artificial output” points in Fig. 3). The established weights, in their turn, reproduce the signal with the same level of correlation as the initial artificial signal (“neuron output” points in Fig. 3). STDP here was non-additive, with $\mu +=0.06$ and $\mu \u2212=0.01$.

So, any desired weight value can be reached by making the neuron generate output with the proper amount of correlation with the corresponding input. Based on this fact, we suggest the following protocol of supervised learning.

### C. A learning algorithm based on controlling input-output correlation

Our model now consists of a single neuron with 100 incoming synapses, all excitatory. As the input data we use 10-dimensional binary vectors, having half components of 0 and the other half of 1. Each vector component of 1 is encoded by 10 synapses of the neuron receiving independent Poisson trains with mean frequency of 20 Hz, a component of 0 – by 10 independent 2-Hz trains. Let each vector belong to one of two classes, *C*_{+} and *C*_{−}, and let the task be that the neuron should produce high mean firing rate in response to vectors from *C*_{+} and the lowest possible mean firing rate in response to vectors from *C*_{−}.

STDP weight change constants are chosen to be *W*_{+} = 0.01, *W* = 1.035*W*_{+}. Initially all weights are set to 0.4.

*The learning protocol* is the following. Input vectors are presented to the neuron in an alternating manner: a vector from *C*_{+} during 5 s, then a vector from *C*_{−} for 1.5 s. During the presentation of a vector from *C*_{−} the neuron is stimulated with constant current, high enough to make the mean output rate close to the highest possible due to refractoriness, $1/\tau ref$.

#### 1. Results

While the neuron is receiving an input vector from *C*_{+} class, a synapse receiving high-frequency input contributes more to the neuron’s output, therefore its weight is more rewarded by STDP. As a result, weights of synapses receiving vector components of 1 increase in 66% cases, and weights of synapses receiving components of 0 decrease in 66% cases with the parameters we have chosen. When a vector from *C*_{−} class is presented, the neuron output is caused by the stimulating current and is poorly correlated with input. So, all weights decrease (for them not to fall to zero the duration of a vector from *C*_{−} is 1.5 s in contrast to 5 s of a vector from *C*_{+}), but weights of high-frequency inputs decrease more due to higher number of post-before-pre events.

To assess the ability of the algorithm to solve a classification task, we took six binary vectors:

three of which are linearly separable from the other three. The desired weights which separate them are known (each digit below corresponds to 10 synapses having the same target weight):

so learning performance can be characterized by the deviation

between actual and target weights during learning (Fig. 4). After 6,045 s of learning (310 cycles of presenting the whole set of vectors) weights converge to bimodal stationary distribution, i. e. each weight tends to either 0 or 1. Not all weights converge to target due to probabilistic nature of input spike trains. However, this effect is averaged out thanks to excessive number of synapses, 10 per each input vector component, and after learning the neuron clearly distinguishes the classes by its mean firing rate, as shown in Fig. 5. Note that weights convergence and classification distinctness are nearly equal for additive STDP and non-additive (with $\mu +=0.06$ and $\mu \u2212=0.01$).

## III. ANN TO SNN MAPPING APPROACH

### A. Network parameters and learning algorithm

We here used, following (Diehl *et al.*, 2015), the combined learning algorithm, involving artificial (ANN) and spiking neural networks (SNN). It consists of the following steps (fig. 6):

*Training the artificial neural network by backpropagation*. The ANN neurons’ activation function was ReLU for hidden layers and Softmax for the output layer. Neuron biases were set to zero. Input data were normalized so that the Eucledian norm of each input vector equaled 1.*Mapping the synaptic weights to the spiking neural network*. In the SNN we here used non-leaky Integrate-and-fire neuron without refractoriness, in which a dimensionless membrane potential obeys $dVdt=1/\tau \u2211i\u2211s\u2208Siwi\delta (t\u2212s)$, where*S*_{i}is spike train on*i*-th input synapse,*w*_{i}is synaptic weight, and $\tau =0.01$ ms. Reaching the threshold $\Theta $, the potential is reset to zero.*Encoding input data to spike trains*. An input vector component*x*was encoded by a Poisson spike train with mean frequency $x\u22c5\nu max$.*Optimizing the spiking network parameters*. Besides $\nu max$ and $\Theta $, simulation time*T*and simulation step $\Delta t$ were adjusted. According to (Diehl*et al.*, 2015), there are two necessary conditions to eliminate accuracy losses after transfer:the simulation time should be long enough to exclude probabilistic influence of spike trains;

the neuron should not have to fire several spikes in one simulation step. So, total input a neuron receives during one simulation step must not exceed the threshold:

To fulfill (3) all spiking neural network weights are divided by the normalization factor *M*, same for all neurons in a layer but unique for each layer,

where *w*_{ij} is *i*-th synapse weight of *j*-th neuron in current layer. Note that this assumes that no more than one spike can arrive from one input in one timestep.

The conditions above are necessary but not sufficient, so achieving maximal classification accuracy still requires adjusting $\nu max$ and $\Theta $.

### B. Fisher’s iris classification

To test the algorithm described above the popular toy task of Fisher’s iris classification was solved. The network had 4 neurons in the input layer, 4 neurons in the single hidden layer, 3 neurons in the output layer. Spiking network weights were normalized according to (4). Each input vector was presented during 1 s. The simulation step was chosen to be 0.1 ms (decreasing it to 0.01 ms did not affect the results). Classification result was determined according to the output neuron that had fired the most of spikes during the simulation.

#### 1. Results

The ReLU network gave the classification error (ratio of wrongly classified input samples to the total number of samples) of $0.04\xb10.01$ on the training sets and $0.06\xb10.04$ on the testing sets, averaged over 10 different divisions of input data into training and testing sets. Mean classification error on test set of spiking network with different $\Theta $ and $\nu max$ is shown in Fig. 7. The highest classification accuracy on spiking network, error of $0.04\xb10.01$, was obtained at high input frequencies and thresholds. Increase in both frequency and threshold, keeping (3), increases the amount of input spikes a neuron has to integrate before it fires a spike itself, and therefore increases classification accuracy. However, increasing $\nu max$ over $1/\delta t$ worsens the accuracy because breaks the condition of no more than one input spike to a synapse in one timestep.

### C. Gender prediction

We now apply the approach under consideration to the task of recognizing gender of a text author. For this purpose we took a subset of RusPersonality (Zagorovskaya *et al.*, 2012), the first corpus of Russian-language texts labeled with information on their Russian-language texts labeled with information on their authors: gender, age, psychological testing data, etc. This free-to-use corpus contains over 1,850 documents, 230 words per document in average, from 1,145 respondents. The texts were written by university students, who were given a few themes to describe. The themes were same for male and female participants, so that one can focus on the peculiarities caused by authors’ gender rather than by their individual styles.

As the input data for the network, each text was described by 141 features:

numbers of different parts of speech: nouns, numerals, adjectives, prepositions, verbs, pronouns, interjections of adverbs, articles, conjunctions, participles, infinitives, and the number of finite verbs (13 total);

numbers of syntactic relations defined in the Russian National Corpus (http://www.ruscorpora.ru/en/), 60 total;

different ratios of number of one part of speech to another according to Sboev

*et al.*, 2015, 27 total;numbers of exclamatory marks, question marks, dots, and emoticons (4 total);

numbers of words expressing a particular emotion according to “Emotions and feelings in lexicographical parameters: Dictionary of emotive vocabulary of the Russian language” (http://lexrus.ru/default.aspx?p=2876 [page in Russian]), 37 emotions total.

The training set contained 364 texts, the testing one 187. Network topology was feedforward: 141 input neurons, 81 neurons in the first hidden layer, 19 neurons in the second hidden layer and 2 neurons in the output layer. Weights were normalized according to (4).

#### 1. Results

The classification error of ReLU neural network was $0.01\xb10.02$ on the training set and $0.28\xb10.04$ on the testing set. Mean classification error on test set of spiking neural network with $\Theta =1$ and different $\nu max$ is shown in Fig. 8. Other thresholds are not shown as they provide significantly worse classification accuracy. The lowest classification error is $0.28\xb10.03$, which equals to the one of ReLU network.

Hereby, the approach under consideration can be applied to the practical machine learning task. However, achieving classification accuracy optimum requires additional adjustment of the spiking network parameters.

## IV. CONCLUSION

Supervised learning can be performed on base of bare spike-timing-dependent plasticity, without any modifications, along with keeping all its parameters in biologically plausible ranges. A solution for the implementation of the teacher is that the neuron is provided with the information on the classes in the form of controlling the correlation between the neuron’s output and the inputs.

There is also a straightforward way to obtain the spiking network for a classification task by training a well-studied artificial network and then using the ready weights in the spiking network. Such a transfer not only allows the network to be implemented in a low-energy-consuming hardware, but also may increase the classification accuracy due to the presence of additional adjustable parameters in a spiking network compared to a formal one.

## ACKNOWLEDGMENTS

This work was supported by RSF project 16-18-10050 “Identifying the Gender and Age of Online Chatters Using Formal Parameters of their Texts”. Simulations were carried out using the NEST Simulator (Gewaltig and Diesmann 2007) and high-performance computing resources of federal center for collective usage at NRC “Kurchatov Institute”, http://computing.kiae.ru.

## REFERENCES

*et al.*(2015)

*et al.*(2013)

*et al.*(2003)

*et al.*(2005)

*et al.*(2009)

*et al.*(2008)

*et al.*(2015)

*et al.*(2016a)

*et al.*(2016b)

*et al.*(2000)

*et al.*(2012)