Resistive Random Access Memory (RRAM) and Phase Change Memory (PCM) devices have been popularly used as synapses in crossbar array based analog Neural Network (NN) circuit to achieve more energy and time efficient data classification compared to conventional computers. Here we demonstrate the advantages of recently proposed spin orbit torque driven Domain Wall (DW) device as synapse compared to the RRAM and PCM devices with respect to on-chip learning (training in hardware) in such NN. Synaptic characteristic of DW synapse, obtained by us from micromagnetic modeling, turns out to be much more linear and symmetric (between positive and negative update) than that of RRAM and PCM synapse. This makes design of peripheral analog circuits for on-chip learning much easier in DW synapse based NN compared to that for RRAM and PCM synapses. We next incorporate the DW synapse as a Verilog-A model in the crossbar array based NN circuit we design on SPICE circuit simulator. Successful on-chip learning is demonstrated through SPICE simulations on the popular Fisher’s Iris dataset. Time and energy required for learning turn out to be orders of magnitude lower for DW synapse based NN circuit compared to that for RRAM and PCM synapse based NN circuits.
I. INTRODUCTION
Crossbar array based analog hardware Neural Network (NN) is considered to be extremely time and energy efficient in executing NN algorithms for data classification applications because it computes at the location of memory itself unlike CPU, GPU and even the recent digital neuromorphic chips which all have memory and computing separate at their smallest cores.1–6 Such crossbar based NN needs an analog Non Volatile Memory (NVM) device, also known as synapse, at each of the intersection points of the crossbars. Typically a Resistive Random Access Memory (RRAM) or a Phase Change Memory (PCM) device is used as synapse.1,7–10 Training the NN in hardware (on-chip learning) is achieved by modulating the conductances of the synapses, corresponding to weights stored in synapses, with electrical programming pulses at every iteration. Though the conductance of RRAM and PCM synapses changes by orders of magnitude due to programming pulses, conductance response characteristic is highly non-linear and asymmetric (between positive and negative conductance update).1,11–13 This leads to issues with design of peripheral circuits for on-chip learning. Learning accuracy suffers. Time and energy consumed in the learning process are also very high.1,9,11–13
Spin orbit torque driven Domain Wall (DW) device based on heavy metal-ferromagnet hetero-structure has been recently proposed and experimentally demonstrated to exhibit synaptic behaviour.3,14–20 In Section II of this paper, we simulate such DW synapse based on experimentally calibrated micromagnetic model. We show that though the range of conductance variation is much smaller for DW synapse than RRAM and PCM synapse, the conductance response of DW synapse to programming current pulse is linear and symmetric unlike RRAM and PCM synapse. In Section III, we design crossbar array of DW synapses in SPICE circuit simulator, with the synapses being Verilog-A models developed from our micromagnetic simulation results. Fully Connected Neural Network (FCNN) algorithm, with Stochastic Gradient Descent (SGD) based weight/conductance update, has been used here for on-chip learning.17,21 Conductance of DW synapse has been quantized here unlike in Bhowmik et al.17 to take the effect of DW pinning by defects into account.22–24 Despite the quantization, high accuracy is obtained on a popular machine learning dataset-Fisher’s Iris,25 in our circuit simulations. We next show that the time taken and energy consumed for on-chip learning of the DW synapse based NN circuit are orders of magnitude lower than RRAM and PCM synapse based NN circuit. Section IV concludes the paper. To the best of our knowledge, this is the first comparison study between a spintronic synapse and RRAM/PCM synapse, with respect to on-chip learning in NN hardware.
II. DEVICE LEVEL COMPARISON
Schematic of our heavy metal/ferromagnetic metal hetero-structure based domain wall based synapse is shown in Fig. 1. The operating physics of the device has been discussed extensively in Refs. 3 and 14–17. The core physics is that of spin orbit torque driven DW motion, which has been extensively studied through simulations and experiments in the past.26–29 When in-plane current (“write” current) flows through the heavy metal layer (“write” path), a DW in the ferromagnetic layer above the it experiences spin orbit torque. If the ferromagnetic layer exhibits Perpendicular Magnetic Anisotropy (PMA) and Dzyaloshinskii Moriya Interaction (DMI) is also present at the interface, experimental reports show that the DW can be of Néel type.26,27,30,31 Those experimental reports also show that such DW moves due to spin orbit torque from the in-plane current, even in the absence of magnetic field.26,27,30 Our micromagnetic simulations also show the same (Fig. 2). Extensive simulations and experiments already carried out26–30,32 on such spin orbit torque drive domain wall device also show that the Oersted field due in-plane current flow through the device is very small32 and does not play a major role in domain wall motion.26,28
In this paper, we consider a device with lateral dimensions 1000 nm × 50 nm. Pt is chosen as the heavy metal in our device owing to its low resistivity compared to other materials with equal or higher spin Hall angle.33–35 This reduces the Joule heating and hence programming energy needed per current pulse to move the domain wall and cause synaptic weight update. Thickness of the heavy metal (Pt) layer is taken to be 10 nm, which is greater than the spin diffusion length in Pt.36,37 Hence, we can consider the vertical spin current density injected by the heavy metal layer on the ferromagnetic layer above it (Js) = in-plane charge current density (Jc) × spin Hall angle (0.07 here, considering Pt).33–35 Thickness of the ferromagnetic layer above the heavy metal layer is taken to be 1 nm. Dynamics of the moments of this layer under the influence of this spin current is simulated using micromagnetic simulation package “mumax3”42 to model such spin current driven DW motion inside it. We choose micro-magnetic simulation parameters for the ferromagnetic layer based on that used for Pt(heavy metal)/CoFe (ferromagnet)/MgO devices in the simulation study of Emori et al.,30 which is based on experimentally observed spin orbit torque driven DW motion in the same devices. The parameters can also be found in supplementary material (Section 1) accompanying this paper.
Since the DW is of Néel type (DMI = 1.2 × 10−3 J/m2), average magnetization inside the wall () and direction of spin polarization of the electrons at the interface of heavy metal and ferromagnet due to current flowing through heavy metal () form a non-zero cross product (Fig. 2). The effective magnetic field experienced by DW is equal to that cross-product.26,28,30 As a result, DW moves as seen in our micro-magnetic simulations (Fig. 2). It has been shown26,28,30 that such effective field arises due to the anti-damping or Slonczweski torque caused by the current, and not due to the field like torque. The field like torque has been found to be very low in such Pt/CoFe/MgO system26 and does not play a major role in the domain wall motion simulated here. Damping factor for the ferromagnetic layer is taken to be 0.3.30 Velocity of domain wall varies with the damping factor, based on the relatives magnitudes of effective field due to current driven spin accumulation at the heavy metal/ferromagnetic metal heterostructure and the effective field due to DMI interaction at the interface.38 Triangular notch regions with Perpendicular Magnetic Anisotropy (PMA) constant = 9 × 105 J/m3 are present on the edges of the simulated ferromagnetic layer in our simulations. PMA in rest of the layer = 8 × 105 J/m3. These notch regions mimic defects, which pin the domain wall for in-plane charge current lower than a certain threshold value.15,22–24,31 Hence, our micro-magnetic simulation (Fig. 3(a)) shows that only above a certain threshold value of current density ( ≈ 5 × 106 A/cm2), velocity of the domain wall is linearly proportional to the current density.38 Hence in our device we have only moved the domain wall with a current pulse (3 ns long) of fixed magnitude (25 μA)(Fig. 2),corresponding to a current density of 5 × 106 A/cm2 (Fig. 2) so that the domain wall is never pinned by defects.22–24 At each edge of the free layer in which the domain wall moves, a pinned ferromagnetic region, with the help of the antiferromagnetic layer above it (Fig. 1), stabilizes the domain wall at the edge and prevents it from getting destroyed.3,14,39
As observed from our micromagnetic simulation, “write” current pulse of magnitude 25 μA and positive polarity always moves DW to the right by a fixed distance of ≈ 20 nm (Fig. 2). Hence < mz > decreases and following equation (1) conductance increases by a step of 0.071 × 10−3 mho (Fig. 3(b)). Current pulse of same magnitude and negative polarity moves DW to the left, < mz > increases and conductance decreases by the same step of 0.071 × 10−3 mho (Fig. 3(b)). Hence conductance response to a series of programming “write” current pulses of equal magnitude (25 μA) is linear and is also symmetric between positive and negative pulses. Also conductance of DW synapse and hence the corresponding weight of the synapse only takes quantized values and thus we take defect pinning into account. Energy consumed through Joule heating per programming pulse of 25 μA for conductance increase/decrease by a single step is calculated to be 0.18 fJ (Table I).
. | . | Number . | Energy . | Energy . | Train . | Test . | Total energy . |
---|---|---|---|---|---|---|---|
. | . | of . | per pulse . | per pulse . | accuracy . | accuracy . | consumed in . |
Synaptic . | conductance . | Conductance . | (conductance . | (conductance . | for on-chip . | for on-chip . | all synapses for . |
devices . | range (mho) . | states . | increase) . | decrease) . | learning (%) . | learning (%) . | on-chip learning . |
Domain wall | 2.9m - 6.1m | 48 | 0.18 fJ | 0.18 fJ | 89 | 90 | 9 fJ |
RRAM | 3μ - 30μ | 100 | 12 pJ - 51 pJ | 2.28 nJ (abrupt reset) | 93 | 94 | 1 μJ |
PCM | 0.1μ - 9.3 μ | 20 | 5 pJ | 30 pJ (abrupt reset) | 89 | 92 | 1.1 μJ |
. | . | Number . | Energy . | Energy . | Train . | Test . | Total energy . |
---|---|---|---|---|---|---|---|
. | . | of . | per pulse . | per pulse . | accuracy . | accuracy . | consumed in . |
Synaptic . | conductance . | Conductance . | (conductance . | (conductance . | for on-chip . | for on-chip . | all synapses for . |
devices . | range (mho) . | states . | increase) . | decrease) . | learning (%) . | learning (%) . | on-chip learning . |
Domain wall | 2.9m - 6.1m | 48 | 0.18 fJ | 0.18 fJ | 89 | 90 | 9 fJ |
RRAM | 3μ - 30μ | 100 | 12 pJ - 51 pJ | 2.28 nJ (abrupt reset) | 93 | 94 | 1 μJ |
PCM | 0.1μ - 9.3 μ | 20 | 5 pJ | 30 pJ (abrupt reset) | 89 | 92 | 1.1 μJ |
Next we compare the conductance response of this DW synapse with that of typical RRAM and PCM synapse. Verilog-A model provided by,43 experimentally benchmarked against,44 has been used for RRAM modeling. Following the 1T1M (one transistor, one memristor) configuration45–47 we connect this RRAM device with a 65 nm technology node transistor (from UMC library) in Cadence Virtuoso circuit simulator (Fig. 4(a)). We observe that when gate voltage pulses of fixed magnitude and duration (200 ns) are applied at the gate of the transistor for conductance increase (voltage of top electrode kept higher than that of bottom electrode for that purpose) (Check supplementary material- Section 2), the conductance of the RRAM synapse does not go up linearly unlike the domain wall synapse. In fact the conductance just saturates to a fixed value (Check supplementary material- Section 2). To achieve a linear increase in conductance gate voltage pulses of increasing magnitude (SET pulses) need to be applied (Fig. 4(b)). This has been observed experimentally in the RRAM devices of.45,47,48 Thus, though the conductance varies over a much wider range for RRAM synapse than DW synapse (Table I), the conductance response is inherently non-linear in nature. As a result, if a certain value of weight update is needed for any synapse for an iteration during on-chip, different magnitude of voltage pulses may need to be applied to bring about the same weight update, depending on what weight/conductance value of the RRAM synapse is before that iteration. This makes designing the analog peripheral circuit for weight update very complicated. In fact, the demonstrations of on-chip learning in RRAM based crossbar NN array so far use a digital FPGA unit or an on-chip CMOS based digital processor, connected to the analog crossbar array, for weight update.47,49 ADC-s and DAC-s needed as a result, which can potentially consume a lot of energy and slow down the circuit. Energy consumed in the 1T1M circuit of Fig. 4(a) ranges between 12 pJ (minimum gate voltage) and 51 pJ (maximum gate voltage), which is much larger than energy consumed for weight/conductance update by a single step in a domain wall synapse (Fig. 3(b)) (Table I). Apart from non-linearity, another issue with conductance response of RRAM synapse is asymmetry between positive and negative update of conductance. If we apply the same gate voltage pulses as in Fig. 4(b) in the reverse order in order to decrease the conductance of the synapse (bottom electrode at higher voltage than top for that purpose), we see that the conductance hardly decreases (Check supplementary material- Section 2). Rather in order to decrease conductance by a certain step, long duration (6 μs) and high magnitude (2.5 V) voltage pulse (hence high energy consuming), known as RESET pulse, needs to be applied at the gate of the transistor for abrupt conductance decrease to the smallest value. It is followed by pulses of gradually increasing voltage pulses (SET pulses) to then increase the conductance.
Conductance response characteristic of the PCM synapse50 we simulated, based on the model developed in Nandakumar et al.51 (See supplementary material- Section 3 for more details), is more linear than RRAM i.e. programming current pulse of fixed magnitude 90 μA and duration 50 ns increase the conductance of the PCM synapse fairly linearly for a larger number of pulses ( ≈ 12) (Fig. 4(c)). Energy associated with each such pulse is 5 pJ,2,51 still much higher than that for domain wall synapse (Table I). Conductance decrease on the other hand is carried out by an abrupt RESET pulse that consumes 30pJ energy each,2 followed by a series of SET pulses much like RRAM synapse. Thus the conductance response characteristic of PCM synapse is still asymmetric like RRAM synapse.
III. NETWORK LEVEL COMPARISON
Next we design crossbar array based Fully Connected Neural Network (FCNN) with domain wall synapses17 and compare the energy and speed performance for on-chip learning with that for equivalent FCNN designed with RRAM and PCM synapses. It is to be noted that this NN is of the second generation non-spiking type52 and uses standard Stochastic Gradient Descent (SGD) algorithm for weight update.21 Verilog-A model of domain wall synapse is designed, based on its conductance response obtained from micromagnetic physics as shown in Fig. 3(b) and inserted in crossbar schematic designed on Cadence Virtuoso circuit simulator (Fig. 5).
Fisher’s Iris dataset, a popular machine learning dataset, is used for the training.25 Since the dataset is not completely linearly separable, in order to carry out accurate classification on it with a FCNN without a hidden layer which we design here, the 4 input features corresponding to each sample are passed through some basic filters first to convert to 16 features.53 Input voltages, proportional to these 16 input features, are applied on the crossbar as shown in (Fig. 5). Read currents, proportional to product of weight of the synapse and each input feature, add up following Kirchhoff’s current law and enter the neuron/activation function circuit at each output node. Thus the input Vector-weight Matrix Multiplication (VMM) is carried out in the crossbar array.1,17“tanh” neuron/activation function (f) acts on the read current at each output node. This function has been designed with transistors in differential amplifier configuration, as shown in Bhowmik et al.17 A weight update circuit follows which calculates the common part of weight update at each output node, using the same SGD method and circuit described in Bhowmik et al.17 The common part of weight update computed at each output node is next multiplied with the inputs using the multiplier circuit (x) as shown in(Fig. 5). In Bhowmik et al.,17 write current proportional to the output of the multiplier (x) at each synapse acts on the DW synapse and updates its weight. However, since conductance of the DW synapse here takes only quantized values and is updated by write current pulses of fixed magnitude (25 μA) only (Fig. 3(b)), an additional quantizer circuit (Q) is present after the multiplier circuit here unlike in Bhowmik et al.17 Design and typical output of the quantizer circuit can be found in supplementary material (Section 4), accompanying this paper. Despite the fact that conductance and hence weight of each synapse takes only quantized value, on-chip learning is achieved with 89 % train and 92 % test accuracy on the Fisher’s Iris dataset (Table I). Test accuracy turns out to be slightly higher than train accuracy because the number of samples available in the dataset is low (100 train, 50 test), so a correct or wrong result just with respect to 1 or 2 samples changes the accuracy number by a few percent.
Similar crossbar based FCNN is designed next with RRAM and PCM synapses, with conductance response as shown in Fig. 4. Similar accuracy for on-chip learning is achieved on Fisher’s Iris dataset (Table I). However, net energy consumed in the synapses for on-chip learning is several orders of magnitude higher for RRAM/PCM synapse than DW synapse (Table I). This is expected because we already showed in Section II that energy consumed for each programming pulse that causes increase of conductance by a step (SET pulse) is orders of magnitude higher for RRAM/PCM synapse than DW synapse. Also, high energy consuming RESET pulses are still needed even though the need for decreasing conductance of a synapse is reduced by using 2 RRAM or 2 PCM per synapse9,47 (Check supplementary material- Section 5). Also, training takes much longer for RRAM/PCM synapse based FCNN compared to DW synapse based FCNN because of the need of occasional long duration RESET pulses (in microseconds). Since at each iteration (each sample in the training set) weights of all synapses need to be updated simultaneously, even if one synapse needs a RESET pulse of microsecond duration, time needed to carry out that iteration is in microseconds. Since DW synapse does not have this issue, time taken for each iteration during learning is 3 ns only (duration of each programming pulse for DW synapse in Fig. 3(b))
At the beginning of the training process/on-chip learning of the domain wall synapse based analog hardware Neural Network (NN), a domain wall needs to be nucleated in each synaptic device.26 After that, for all iterations of the training, the same domain wall just needs to be moved back and forth if needed with current pulses to update the weight of the synapse, as described in the paper. Hence energy and time are needed to nucleate the domain wall only at the beginning of the training, and not later on. Hence this energy and time do not contribute much to the total energy and time needed for on-chip learning of the domain wall synapse based NN.
IV. CONCLUSION
Thus in this paper we have shown through device and network level simulations that on-chip learning in DW synapse based NN circuit can consume much less time and energy than RRAM and PCM synapse based NN circuit.
AUTHOR’S CONTRIBUTIONS
D.K. and U.K. contributed equally to this work.
Supplementary Material
See supplementary material for further details on our simulations of the domain wall, RRAM and PCM based synaptic devices and classification accuracy for on-chip learning of analog hardware neural networks designed from such synaptic devices.
ACKNOWLEDGMENTS
Debanjan Bhowmik thanks Department of Science and Technology (DST), India for INSPIRE Faculty Award and Science and Engineering Research Board (SERB), India for Early Career Research (ECR) Award, which helped fund this research.