Recently, a system of spintronic vortex oscillators has been experimentally trained to classify vowel sounds. In this paper, we have carried out a combination of device-level and system-level simulations to train a system of spin Hall nano oscillators (SHNOs) of smaller size (25X lower in area compared to those vortex oscillators) for such data classification tasks. Magnetic moments precess in an uniform mode as opposed to the vortex mode in our oscillators. We have trained our system to classify inputs in various popular machine learning data sets like Fisher’s Iris data set of flowers, Wisconsin Breast Cancer (WBC) data set, and MNIST data set of handwritten digits. We have employed a new technique for input dimensionality reduction here so that the clustering/target synchronization pattern changes based on the nature of the data in the different data sets. Our demonstration of learning in a system of such small SHNOs for a wide range of data sets is promising for scaling up the oscillator-based neuromorphic system for complex data classification tasks.

## I. INTRODUCTION

### A. Motivation

Spintronics-based neuromorphic/in-memory computing hardware has been considered to be a low energy alternative to transistor-based digital hardware, which has memory-computing separation, for data classification tasks.^{1} One approach for spintronic neuromorphic computing is designing an analog crossbar array of domain-wall-based or skyrmion-based synaptic devices.^{2–6} Fully Connected Neural Network (FCNN) algorithms, of spiking or non-spiking type, are then implemented on this crossbar.^{2,5–11} The properties/parameters of the activation function, or neuron, are static in this case. A large number of parameters needs to be tuned to achieve learning in such FCNN.^{12}

Recently, in an alternative approach, Romera *et al.*^{12} use the dynamic properties of synchronized spintronic oscillators in an experimentally fabricated network with far less parameters, which need to be tuned, to achieve on-chip learning for classification of seven vowel sounds. Magnetic Tunnel Junction (MTJ) cells of quite large diameter ( ≈ 375 nm) are used as vortex mode spin torque oscillators for this purpose. But Romera *et al.*^{12} also propose that in order to scale up the system for more complicated classification tasks and to reduce the power consumption, a spin torque oscillator of much lower diameter needs to be used.^{13,14}

At this lower length scale, the nanomagnet may not oscillate in the vortex mode but rather oscillate close to the uniform mode. In the uniform mode, most individual magnetic moments inside the ferromagnetic layer of the spin torque oscillator always remain parallel to each other and precess in unison about an axis.^{13,15,20} So it is important that considering this uniform mode of oscillation, classification is shown for a system of such coupled spin torque oscillators.

### B. Our contributions here

In this paper, we have carried out simulation-based device-system co-study of an array of uniform-mode spin-orbit torque nano oscillators, also known as spin Hall nano oscillators (SHNOs),^{20} of diameter 75 nm (5X lower in diameter compared to the vortex oscillator in Romera *et al.*,^{12} 25X lower in area) to demonstrate learning of classification tasks in such a system. We model such uniform-mode SHNOs through micromagnetics (the ‘mumax3’ package^{16}) and show that they can be locked to the external radio frequency (RF) magnetic field, which mimic the input values in reduced dimensions much like the vortex mode oscillators in Romera *et al.*^{12} (see Sec. II below for this device-level study). Thus, classification of the input can also be achieved through an array of such uniform-mode oscillators much like in the vortex oscillators in Romera *et al.*,^{12} as we show in Sec. III through our system-level study.

Also Romera *et al.* show classification in only one data set (vowel sounds) experimentally.^{12} But in our simulations, we show learning on several popular machine learning data sets like Fisher’s Iris data set of flowers,^{17} Wisconsin Breast Cancer (WBC) data set,^{18} and the MNIST data set of handwritten digits^{19} (Sec. 1 of supplementary material). The MNIST data set is much larger than the vowel data set with each input being a 784-dimensional vector (28-by-28-pixel image) as opposed to 12-dimensional vector in the vowel data set.^{12} There are also many more samples in the MNIST data set than in the vowel data set.^{12,19} Thus, we have trained our system on more diverse, and sometimes more complex, data classification tasks, compared to Romera *et al.*^{12}

In order to make the spin-oscillator-based classification scheme in Romera *et al.*^{12} applicable for all these different data sets, in our paper, we have changed the input dimensionality reduction technique from what was followed in Romera *et al.*^{12} and made it more versatile. We explain this in details in Sec. III below. Since we also report high classification accuracy (above 80 % for MNIST, above 90 % for other data sets) through our system-level study in Sec. III, it shows that our learning scheme is quite effective for a wide range of classification tasks.

## II. MICROMAGNETIC SIMULATION OF UNIFORM-MODE SHNO

### A. Natural frequency of SHNO

First, we model a heavy metal/ferromagnetic metal hetero-structure-based spin Hall nano oscillator (SHNO) using micromagnetics.^{20–22} In-plane charge current flowing through the heavy metal layer results in vertical spin current due to the spin Hall effect. The spin current consequently exerts a spin orbit torque on the magnetization in the ferromagnetic metal layer.^{23,24} The ratio of spin current to in-plane charge current density is determined by the spin Hall angle of the heavy metal.^{23} If the ferromagnet exhibits perpendicular magnetic anisotropy (PMA),^{25} as in the case considered here, the magnetization is found to precess about the vertical axis for an appropriate range of values of in-plane current density due to the spin-orbit torque (Figure 1 (a)), as reported through simulations and experiments previously.^{20,25–27}

Here, the ferromagnetic metal layer of diameter 75 nm and thickness 2 nm (in the SHNO) is modeled as a circular grid of magnetic moments in the micromagnetic simulation package ‘mumax3.’^{16} The package essentially solves Landau Lifschitz Gilbert (LLG) equation (modified by an expression for spin orbit torque) to solve for the dynamics of each magnetic moment under the influence of spin orbit torque due to in-plane charge current flowing through the heavy metal layer below the ferromagnetic metal layer. Dipole coupling and exchange interaction between the magnetic moments are taken into account. The following parameters are used for the magnetic moments in the ferromagnetic layer: exchange correlation constant (*A*_{ex}) = 3 × 10^{−11} J/m, saturation magnetization *M*_{sat} = 1313 × 10^{3} A/m, Perpendicular Magnetic Anisotropy (PMA) field *H*_{k} = 1.79 T (Figure 1 (a)), vertically applied, external dc magnetic field *H*_{app} = 0.1 T, gyromagnetic ratio *γ* = 17.6 × 10^{10} Hz/T, and damping constant *α* = 0.005.^{27–30} To carry out the ‘mumax3’ simulation, we take the cell size to be 2 nm × 2 nm × 1 nm. ‘Mumax3’^{16} uses a Runge Kutta method with an adaptive time step to solve the modified LLG equation for the magnetic moments.

In order to simulate the influence of the spin orbit torque on the magnetic moments, spin current, calculated from the charge current using the value of spin Hall angle for the heavy metal (Pt in this case), has been used in our micromagnetic simulation.^{16} Thickness of the heavy metal (Pt) layer is taken to be 10 nm, which is greater than the spin diffusion length in Pt.^{33,34} Hence, we can consider the vertical spin current density injected by the heavy metal layer on the ferromagnetic layer above it (*J*_{s}) = in-plane charge current density (*J*_{c}) × spin Hall angle (0.07 here, considering Pt).^{23,31,32} The field-like torque term has not been considered here because the field-like torque has been found to be very low in such heavy metal/ferromagnetic metal system like the one we have been considered here.^{35} Taniguchi *et al.*^{27} show that when the ferromagnetic layer exhibits PMA and the spin polarization ($\sigma \u20d7$), due to the charge current flow, is in in-plane, the magnetization ($m\u20d7$) exhibits oscillation (like we observe here next) even in the presence of just Slonczweski-like torque, which has the form: $m\u20d7\xd7(m\u20d7\xd7\sigma \u20d7)$. The field-like torque term, of the form $m\u20d7\xd7\sigma \u20d7$, has not been included in Taniguchi *et al.*^{27} and yet oscillation is observed. So we do not consider the field-like torque term here and only consider the Slonczweski-like spin-orbit torque term. Micromagnetic simulations on a similar heavy metal (Pt)/ferromagnet system reported elsewhere^{8,10,11} also do not consider the field-like torque.

In our micromagnetic simulation, for the value of current density greater than 7.5 × 10^{11} A/m^{2}, all the magnetic moments, apart from the ones in the periphery, precess around the z-axis (Figure 1 (a)) while being parallel to each other. This can be seen from the directions of the magnetic moments in the SHNO at different snapshots of time within one time period of precession of the moments Fig. 2. Thus our system is very close to the uniform mode of magnetic oscillation, as mentioned in Sec. II.

The ‘mumax’ software provides us with the spatial average of the magnetic moments in x, y, and z directions as functions of time. Since the moments precess about the z-axis here (Fig. 1(a)), we plot the spatial average of the moments in x direction as a function of time to get a sinusoidally varying function of time (refer to the magnetic moment (in x direction) vs time plot in Sec. III of supplementary material). As shown in that plot, the difference in time between two consecutive data points in our simulation is ≈1 ps here. When natural frequency of the oscillator is 6.5 GHz (Fig. 3(c)) and hence time period of each oscillation is ≈153.8 ps, there are about 154 simulated points within each oscillation cycle.

Carrying out Fast Fourier Transform (FFT) of the variation of magnetization with time, the natural frequency/auto-oscillation frequency of the oscillator is obtained. The natural/auto-oscillation frequency is defined as the frequency of oscillation in the absence of an external RF magnetic field. From our micromagnetic simulations, we observe that as the current density increases, the natural frequency of oscillation decreases (the precession angle increases), in accordance with Taniguchi *et al.*^{27} (Figure 1 (b)). The range in which natural frequency of our SHNO can be modulated with current matches with that observed experimentally on nanoconstriction SHNO with PMA, based on Pt (heavy metal)/CoNi (ferromagnet) bilayer.^{26}

While taking FFT of the variation of magnetization with time, we have eliminated the effect of transient oscillations. To do this, while taking FFT, we have not considered the first 70 ns and only considered the next 30 ns. For the range of the natural frequency of the oscillator we consider here, i.e., 6.2–7.2 GHz (as we see in Sec. II B next), this implies that the first 434–504 cycles are ignored (to remove transient effects) and only the next 186–216 cycles are considered for the FFT calculation.

### B. Synchronization of SHNO to external RF magnetic field

Similar to Romera *et al.*,^{12} the oscillators here have to be synchronized to the frequencies of the external RF magnetic field, corresponding to the inputs in reduced dimension. So we use micromagnetic simulation to determine the synchronizing/locking characteristic of our uniform-mode oscillator with respect to the frequency of external RF magnetic field: Fig. 3(a), (b), and (c). From Fig. 3(b), when the natural frequency of the oscillator is 6.7 GHz (corresponding applied current density = 8.35 × 10^{11} A/m^{2}) and an external RF magnetic field of magnitude 1 mT is applied vertically/out of the plane, we observe that within the external field’s frequency range of around 6.65–6.77 GHz, the frequency of oscillation of our oscillator is equal to the frequency of the external field, and not the natural frequency of the oscillator. Thus, when natural frequency is 6.7 GHz, the locking range/synchronization bandwidth is about ± 0.06 GHz around the natural frequency (0.12 GHz in total).

Similarly, from all the micromagnetic simulations we carry out of this oscillator on ‘mumax3,’ we find out that for any other value of natural frequency between 6.2 and 7.2 GHz, the locking range/synchronization bandwidth is also approximately 0.12 GHz (± 0.06 GHz) around that value of natural frequency. For example, Fig. 3(c) shows that when the applied current density is 8.75 × 10^{11} A/m^{2} and hence the natural frequency of the spin oscillator is 6.5 GHz (Fig. 1 (b)), within ± 0.06 GHz of 6.5 GHz, the oscillator is locked/synchronized to the external RF magnetic field. The same bandwidth is observed when the current density is 7.9 × 10^{11} A/m^{2} (natural frequency = 6.96 GHz, consistent with Fig. 1 (b)) (Fig. 3(a)). The magnitude of the external RF magnetic field is 1 mT in all these cases (Fig. 3 (a), (b), and (c)).

This information about the locking range/synchronization bandwidth is useful for tuning the frequencies of the oscillators according to the synchronization pattern for the different input categories by adjusting the applied current on the oscillators, as we see in Sec. III. In this paper, we restrict ourselves to the current-density range of 7.5 × 10^{11} A/m^{2} to 9.6 × 10^{11} A/m^{2} (and hence, the natural-frequency range of 7.2 GHz to 6.2 GHz) such that the synchronization bandwidth is always 0.12 GHz (± 0.06 GHz) around the natural frequency value, as we observe in Fig. 3 (a), (b), and (c).

## III. CLASSIFICATION WITH AN ARRAY OF UNIFORM-MODE SHNO-s

### A. Dimensionality reduction of input to design the target synchronication pattern

The input in any data set has to be transformed from a much higher dimensional space to a low-dimensional space for classification with a few SHNOs, as shown in the schematic of the system we design for training here (Fig. 4). In Romera *et al.*,^{12} the synchronization pattern (2-dimensional cluster plot) of the four oscillators for the different categories of input (7 vowel sounds) is first decided upon. Accordingly, using least-square regression in software, the matrix coefficients are calculated such that each higher dimensional input vector, belonging to an input category (vowel sound), is linearly transformed, through the matrix multiplication, to a 2-dimensional vector, corresponding to the cluster assigned for that category in the synchronization pattern. After that, the frequencies of the four oscillators are mapped to the synchronization pattern following a learning rule.^{12}

But if this method in Romera *et al.*^{12} is employed for not just the vowel data set but for many other data sets, then the clusters in the lower dimensional space (of the input) do not change from one data set to another. Only the transformation matrix coefficients will change from one data set to another, while the cluster configuration remains fixed. Thus, this method does not take advantage of the inherent nature of the data in different data sets for cluster formation and subsequent classification.

But in this paper, since we have shown classification on different data sets as opposed to one data set in Romera *et al.*,^{12} we have used a more versatile technique—Neighbouring Components Analysis (NCA)^{36}—to reduce the input dimensions and determine the clusters/synchronization pattern. Here in the NCA method also, the transformation from higher to lower dimensional space of the input is linear (can be represented by a matrix multiplication) (Fig. 4) just as in the dimensionality reduction method in Romera *et al.*^{12} Also, supervised learning is followed in both cases. But in Romera *et al.*,^{12} the data-independent oscillator synchronization pattern is used for supervision to calculate the coefficients of the transformation matrix, making the clusters independent of data sets.

In our NCA technique, as described in Goldberger *et al.*,^{36} we iteratively calculate the coefficients of the transformation matrix *A* such that the linear transformation of a high dimensional data point *x*_{i} i.e., *Ax*_{i} gives a point in lower dimension. NCA exploits the presence of labels in a supervised manner to arrive at this matrix A such that if two points *x*_{i} and *x*_{j} belong to same class *C*, they are closer in the transformed space; otherwise, they are far apart.

While the reduction of the input from the higher-dimensional space to the lower-dimensional space is linear both here and in Romera *et al.*^{12} (both given by a transformation matrix), the formation of the clusters in the reduced two-dimensional space depends on the nature of the input data and varies from data set to data set here, as shown in Fig. 5. This is not the case in Romera *et al.*^{12} since a fixed synchronization pattern is used there. Thus our technique is more versatile than that followed in Romera *et al.*^{12}

In our case, for the WBC data set, the number of input features is 30. So for WBC, we carry out a 30-to-2 reduction. For the Iris data set, the number of input features is 4 (sepal length, sepal width, petal length, petal width). So for Iris, we carry out a 4-to-2 reduction. For the MNIST data set, the number of input features is 784 since each image is a 28-by-28-pixel image. So for MNIST, we carry out a 784-to-2 reduction (see Sec. I of supplementary material). The coefficients of the final matrix for input-dimension reduction, obtained through the aforementioned NCA technique mentioned above, have been provided in Sec. VI of supplementary material for all the data sets: WBC, Iris, and MNIST.

For the different data sets, the two features in the reduced two-dimensional space in Fig. 5 (Feature 1, Feature 2) are then transformed to two input RF magnetic field frequencies *f*_{A} and *f*_{B} (Fig. 4). Please refer to Sec. VII of supplementary material for details on this transformation. These two transformed frequencies must be in that range of natural frequencies (6.2–7.2 GHz) of our uniform-mode spin oscillators where the synchronization bandwidth is nearly constant, as determined in Sec. II B. This leads to the generation of the target synchronization patterns for our spin oscillators.

### B. Tuning the SHNOs’ frequencies to the target synchronization pattern to achieve learning

Though our method for the generation of the clusters/target synchronization pattern is different from that followed in Romera *et al.*,^{12} our method for modulating the frequencies of the oscillators to match with the synchronization pattern is similar to that used in Romera *et al.*^{12} Much like in Romera *et al.*,^{12} we also modulate the currents going into the oscillators to change their natural frequencies, as shown in our system-level schematic (Fig. 4). For this purpose, the system of oscillators is modeled in the numerical package Matlab. Each oscillator’s natural frequency modulation characteristic (Fig. 1(b)) and synchronization (to external RF magnetic field) characteristic (Fig. 3) are obtained from micromagnetic simulations of the SHNO device, as shown in Sec. I, and incorporated in the Matlab code. Thus this becomes a device-system co-study.

For all the data sets we use here, the natural frequencies of the oscillators are initialized to random values within the chosen frequency range of 6.2–7.2 GHz since the synchornization bandwidth is almost constant in this range (± 0.06 GHz around the natural frequency value, as mentioned in Sec. II B). Then during the training process, the natural frequencies of the oscillators have been updated by modulating their current densities, using a current density step of around 1 × 10^{10} A/m^{2}, but again restricting the natural frequencies within that range of 6.2–7.2 GHz. Since our target synchronization pattern is also in this range, this restriction works fine for our training process.

The learning rule we use for updating the natural frequency of each oscillator after every iteration in our training process is described in Sec. IV of supplementary material. Our learning rule ignores the existence of mutual coupling between the oscillators.

Romera *et al.* use 4 vortex oscillators to classify the seven vowel sounds. In our paper, since we use different data sets for classification, the number of oscillators (*m* in Fig. 4) used varies from one data set to data set. The number of oscillators needed increases with an increase in the number of categories to be classified into, or output labels, in a data set. So we use 2 oscillators for WBC data set (2 output labels: ‘malignant’ or ‘benign’) (Fig. 5(a)), 2 oscillators for Iris data set (3 output labels for the 3 flower types: setosa, virginica, and versicolor) (Fig. 5(b)), and 5 oscillators for MNIST data set (10 output labels for the 10 digits: 0 to 9) (Fig. 5(c)).

For the WBC data set, we classify the input as ‘malignant’ when the oscillator 2 couples to *f*_{B} and ‘benign’ when the oscillator 1 couples to *f*_{A} [Fig. 6(a)]. Thus the output of the synchronization detector in Fig. 4 determines the category/label of the input. Although the training set and the test set must be taken from the same data set, we must ensure that the samples in the test set are different from the ones in the training set. So we have used the first 455 samples in the WBC data set for training and the next 114 samples for testing. The 30 input features corresponding to some of the training and test samples in the WBC data set are tabulated in Sec. I of supplementary material. After 150 iterations, training accuracy is 96 % (Fig. 7) and test accuracy is 92 % (Table I). Intermediate stages of the training are shown in Fig. 8.

Type of learning . | WBC: Train accuracy (%) . | WBC: Test accuracy (%) . | Iris: Train accuracy (%) . | Iris: Test accuracy (%) . | MNIST: Train accuracy (%) . | MNIST: Test accuracy (%) . |
---|---|---|---|---|---|---|

Our SHNO network | 96 | 92 | 91 | 94 | 85 | 83 |

Single-layer perceptron | 96 | 98 | 97 | 85 | 97 | 82 |

Type of learning . | WBC: Train accuracy (%) . | WBC: Test accuracy (%) . | Iris: Train accuracy (%) . | Iris: Test accuracy (%) . | MNIST: Train accuracy (%) . | MNIST: Test accuracy (%) . |
---|---|---|---|---|---|---|

Our SHNO network | 96 | 92 | 91 | 94 | 85 | 83 |

Single-layer perceptron | 96 | 98 | 97 | 85 | 97 | 82 |

We verify that the final synchronization pattern, after training, for the WBC data set is correct (Fig. 6(a)) by carrying out further micromagnetic simulations of the oscillators on ‘mumax3,’ in the presence of two external RF magnetic fields with frequencies (*f*_{A} and *f*_{B}) simultaneously applied. As observed from Fig. 6(a), the two-oscillator system is trained on the WBC data set when oscillator 1’s natural frequency is 6.45 GHz and oscillator 2’s natural frequency is 6.7 GHz. In Sec. V of supplementary material, we show through micromagnetic simulations that when current densities through the oscillators are such that the natural frequencies of the two oscillators take the above-mentioned values and the *f*_{A} and *f*_{B} values correspond to that of the different samples in the ‘malignant’ cluster, then oscillator 2’s frequency changes with *f*_{B} value while oscillator 1’s frequency neither follows *f*_{A} or *f*_{B}. But when the *f*_{A} and *f*_{B} values correspond to that of the different samples in the ‘benign’ cluster, then oscillator 1’s frequency changes with the *f*_{A} value while oscillator 2’s frequency neither follows *f*_{A} or *f*_{B}. This is what is expected from Fig. 6(a): for ‘malignant’ cluster, oscillator 2 synchronizes with the second external RF magnetic field (of frequency *f*_{B}) and for ‘benign’ cluster, oscillator 1 synchronizes with the first external RF magnetic field (of frequency *f*_{A}). Thus our micromagnetic simulation results under two external RF magnetic fields agree with the synchronization pattern (after training) obtained from the Matlab code.

For the Iris data set, we classify the input as of the ‘setosa’ flower type when oscillator 2 synchronizes with *f*_{A}, as ‘versicolor’ type when oscillator 1 synchronizes with *f*_{A} and oscillator 2 synchronizes with *f*_{B} at the same time, and as ‘virginica’ type when oscillator 1 synchronizes with *f*_{A} (Fig. 6(b)). Similar to what we have done for the WBC data set, here we have used the first 120 samples of the Iris data set for training and the next 30 samples for testing. The 4 input features corresponding to some of the training and test samples in the Iris data set are tabulated in Sec. I of supplementary material. After 150 iterations, training accuracy is 91 % (Fig. 7) and test accuracy is 94 % (Table I). Intermediate stages of the training are shown in Fig. 9.

For the MNIST data set, we have similar combinations of synchronization of different oscillators with the input frequencies to classify the data set (Fig. 6(c)). The MNIST data base, by default, provides us with different images of handwritten digits (0–9) for training and different images for the same handwritten digits (0–9) for testing. From this MNIST data base, we took 800 images for training and 200 images for testing. Some of the input images (28-by-28-pixel) for the training set and the test set are shown in Sec. I of supplementary material. After 150 iterations, training accuracy is 85 % (Fig. 7) and test accuracy is 83 % (Table I).

Thus, we obtain fairly high classification accuracy numbers for all the data sets, showing that we have achieved successful learning with our system of uniform-mode oscillators. For comparison, we train in software (Matlab) a standard single-layer perceptron/Fully Connected Neural Network (FCNN) without a hidden layer.^{38} Such an FCNN has a similar number of parameters as our oscillator-based system since our system includes the parameters in the matrix which transforms the input data from higher dimensional space to two-dimensional space (Fig. 4). We use the same three data sets as before for the single-layer perceptron/FCNN. The non-linear activation function used at the output layer here is the very popular ‘softmax’ function.^{37} Stochastic Gradient Descent method is used to optimize the mean-squared error over every iteration.^{38} The structure and underlying non-linear function/equations for the FCNN, which we use here for comparison, are provided in Sec. II of supplementary material.

While the final train accuracy numbers are comparable or higher for the single-layer perceptron compared to our SHNO network (Table I), it is the performance on the test set after training (test accuracy) that matters more.^{37} We observe that both for Iris and MNIST data sets, the test accuracy is comparable or higher for our uniform-mode oscillator network compared to the single-layer perceptron (Table I). The test accuracy for FCNN improves drastically when we add a hidden layer, but then the number of parameters that need to be tuned also goes up.^{12,38}

### C. Power dissipation estimation

As mentioned in Sec. I, making oscillator-based systems scalable for data classification is a major motivation for this study. An important aspect of scalability for computing systems is that when the system is scaled, as we propose to do here by reducing the area of the oscillator by 25X from Romera *et al.*^{12} the power dissipation per component (the oscillator here) also must go down and thus the power dissipation per unit area of the chip must remain roughly the same.^{39}

We now show that this is indeed the case in our work. While the current density in our uniform-mode spin Hall nano oscillator is one order higher than that in the vortex-mode spin oscillator of Romera *et al.*,^{12} the area of cross-section is different in the two cases. In Romera *et al.*,^{12} the current flows perpendicular to the plane. So the area of cross-section for the current flow is equal to the lateral area of the oscillator, which is proportional to the square of the diameter of the tunnel junction.

But the area of cross-section is different for our uniform-mode SHNO since the current flows in-plane here. Here, the area of cross-section for current flow is roughly equal to the product of the diameter of the SHNO and the thickness of the heavy metal layer in it (10 nm). The thickness of the heavy metal layer is much lower than the lateral diameter of the oscillator. Also the diameter of the oscillator in our case is 5 times lower than that in Romera *et al.* As a result, the area of cross-section here is much lower than Romera *et al.* This leads to lower current, despite the same current density. For current density of 7.5 × 10^{11} A/m^{2} in our SHNO here, we calculate the current to be 0.56 mA. Resistance of the current path = resistivity of Pt x (length of the path/area of cross-section). Considering the length of the path roughly equal to the diameter of the oscillator, resistance of the current path = resistivity of Pt/thickness of the heavy metal (Pt) layer = 10 Ω. Thus in our case, power dissipation per oscillator is calculated to be 3 *μ*W, which is three orders lower than that reported for the vortex oscillator in Romera *et al.*^{12} Since the area of the oscillator has gone down from Romera *et al.*^{12} to our proposed system here by 25 X, power dissipation per unit area remains roughly the same.

While calculating the power consumed by our spin Hall oscillators, we did not include the read power consumed in detecting the synchronization pattern of the oscillators. But the read power is dominated by the magnitude of the read current, which in spin Hall oscillators is typically several orders lower than the write current.^{23,24} This is because the write current has to be strong enough to trigger magnetic oscillation through the transfer of spin-orbit torque to the magnetization. On the contrary, the read current merely detects the direction of the magnetization with respect to a fixed direction (direction of magnetization in the fixed layer of the tunnel junction) through the Tunneling MagnetoResistance (TMR) effect. Designing the read-out circuit for detection of synchronization of the oscillators and calculation of the read power consumed in the circuit will be a part of our future study.

For the natural frequency range of our spin Hall oscillator (6.2–7.2 GHz), ignoring the read power, the energy per oscillation turns out to be 0.41–0.48 fJ since the write power/power to generate the oscillation is ≈ 3 *μ*W. Our energy per oscillation turns out to be 1–2 orders lower than the energy per spike reported for the implementation of spiking neural networks through spintronic systems.^{42}

## IV. CONCLUSION

Thus, in this paper, we have shown that a system of very small SHNOs (75 nm in diameter), where the moments precess in a nearly uniform mode, can be used to classify inputs in different popular machine learning data sets. Through a combination of device-level micromagnetic study and system-level simulations, we show reasonably high classification accuracy with our SHNO-based system. Our study is important for scalable spintronic neuromorphic systems for this kind of data classification tasks.

While coupling between the oscillators and the external RF magnetic fields, the frequencies of which depend upon the input in reduced dimensions, has been considered in our simulations, the coupling among the oscillators themselves has been ignored here, as mentioned earlier. Including coupling among the oscillators in the simulation during the training process, as in Romera *et al.*,^{12} Vassileva *et al.*^{40} and Vodenicarevic *et al.*,^{41} for the different data sets used here, will be a part of our future study.

## SUPPLEMENTARY MATERIAL

See supplementary material for more details or additional data on the data sets used, the input-dimension reduction technique, the learning rule followed, and micromagnetic simulation of the spin oscillators.

## ACKNOWLEDGMENTS

We thank the Department of Science and Technology, India, for INSPIRE Faculty Award and the Science and Engineering Research Board, India, for Early Career Research Award. These awards helped fund this research.

## DATA AVAILABILITY

The data that support the findings of this study are available from the corresponding author upon reasonable request.