We propose thermally driven, voltage-controlled superparamagnetic ensembles as low-energy platforms for hardware-based reservoir computing. In the proposed devices, thermal noise is used to drive the ensembles' magnetization dynamics, while control of their net magnetization states is provided by strain-mediated voltage inputs. Using an ensemble of CoFeB nanodots as an example, we use analytical models and micromagnetic simulations to demonstrate how such a device can function as a reservoir and perform two benchmark machine learning tasks (spoken digit recognition and chaotic time series prediction) with competitive performance. Our results indicate robust performance on timescales from microseconds to milliseconds, potentially allowing such a reservoir to be tuned to perform a wide range of real-time tasks, from decision making in driverless cars (fast) to speech recognition (slow). The low energy consumption expected for such a device makes it an ideal candidate for use in edge computing applications that require low latency and power.
Artificial intelligence, and in particular, machine learning (ML) techniques, are widely used across a variety of sectors, but the associated energy cost of training highly complex models has become a significant challenge.1 This has led to a desire to produce efficient, hardware-based neuromorphic platforms to replace the current state of the art, i.e., software-based models evaluated on traditional CMOS computing architectures.2
Recurrent neural networks (RNNs) are inspired by the high interconnectivity of biological systems and can solve complex time-dependent tasks, such as speech recognition. However, their temporal interconnectivity requires complex training methods that are computationally expensive and difficult to implement on hardware.3 The reservoir computing (RC) paradigm provides a solution to this by using a RNN with fixed synaptic weights (the reservoir) to transform inputs into a higher dimensional representation prior to them being passed to a simple, feed-forward output layer with trainable weights.4 Time multiplexing allows RC to be performed even if the reservoir consists of only a single dynamical node.5
RC is also particularly well suited to hardware-based implementations as the reservoir can be replaced with any suitable physical system provided it has the correct properties: (a) the ability to produce nonlinear transformations of input data and (b) fading memory of past inputs. Such implementations have the potential for lower energy costs than reservoirs simulated on conventional computers as they can directly utilize the intrinsic functionalities of the physical systems. This has led to a wide range of systems being proposed as suitable reservoirs.6 In particular, magnetic systems exhibit nonvolatility and nonlinearity, and thus are positioned as ideal candidates.7 Previous investigations have demonstrated the suitability of a wide range of magnetic systems for use as reservoirs, including spin-torque nano-oscillators, dipole-coupled nanomagnets, and arrays of interconnected nanowires.8–14
Here, we propose strain-mediated, voltage-controlled superparamagnetic ensembles as platforms for creating ultra-low-energy hardware-based reservoirs. Using a combination of micromagnetic simulations and analytical modeling, we will show that these ensembles possess the reproducible behavior, nonlinearity and fading memory required for RC and demonstrate (using a model of the system) competitive performance on two benchmark machine learning tasks: spoken digit recognition and chaotic time series prediction.
The modeled systems consist of ensembles of circular CoFeB nanodots with diameters in the range 40 nm to 90 nm and thicknesses of 4 nm [Fig. 1(a)]. The dots' diameter is fixed within a single ensemble, but different values allow access to different timescales of the ensembles' responses. We assume a large ensemble (over 1000 dots, or a 6.3 μm square array) with nanodots spaced by 160 nm to minimize dipolar coupling. The dots are assumed to have saturation magnetization 1200 kA m−1, exchange stiffness 10 pJ m−1, and a growth or shape-induced15,16 uniaxial anisotropy K = 8 kJ m−3.
(a) Schematic of a 25 dot ensemble. Vi and Vo show where voltage is input and output. The remainder of this figure considers single nanodots from the larger ensemble. (b) Schematic of the control mechanism for the dots. Application of an additional strain-mediated anisotropy acts to rotate the anisotropy axis with respect to the fixed field. Behavior in a single dot is shown, but the strain is applied to the whole ensemble. (c) Energy landscape for a nanodot under strain as given by Eq. (1) for , h = 0.4, , and . (d) Micromagnetic simulations of thermally activated switching in a single 40 nm diameter, 4 nm thickness CoFeB cylinder with 8.2 kJ m−3 intrinsic uniaxial anisotropy and a field (h = 0.4) applied at to the anisotropy axis. Top, no strain (), unbiased. Middle, tensile applied strain (), rightward pointing state preferred. Bottom, compressive applied strain (), leftward pointing state preferred. (e) Multiple switches are sampled from the tensile strain micromagnetic simulation to plot the probability of not-switching as a function of time in the state. Average dwell time in the leftward state is shorter than in the rightward state. (f) Switching rate vs applied anisotropy for a single nanodot. The points are for the micromagnetic simulations. The curves are rates for the energy barriers given by the Stoner–Wohlfarth model [Eq. (1)].
(a) Schematic of a 25 dot ensemble. Vi and Vo show where voltage is input and output. The remainder of this figure considers single nanodots from the larger ensemble. (b) Schematic of the control mechanism for the dots. Application of an additional strain-mediated anisotropy acts to rotate the anisotropy axis with respect to the fixed field. Behavior in a single dot is shown, but the strain is applied to the whole ensemble. (c) Energy landscape for a nanodot under strain as given by Eq. (1) for , h = 0.4, , and . (d) Micromagnetic simulations of thermally activated switching in a single 40 nm diameter, 4 nm thickness CoFeB cylinder with 8.2 kJ m−3 intrinsic uniaxial anisotropy and a field (h = 0.4) applied at to the anisotropy axis. Top, no strain (), unbiased. Middle, tensile applied strain (), rightward pointing state preferred. Bottom, compressive applied strain (), leftward pointing state preferred. (e) Multiple switches are sampled from the tensile strain micromagnetic simulation to plot the probability of not-switching as a function of time in the state. Average dwell time in the leftward state is shorter than in the rightward state. (f) Switching rate vs applied anisotropy for a single nanodot. The points are for the micromagnetic simulations. The curves are rates for the energy barriers given by the Stoner–Wohlfarth model [Eq. (1)].
A schematic diagram illustrating the geometry of an individual dot is shown in Fig. 1(b). A fixed magnetic field (40% of the anisotropy field) is applied at to the anisotropy axis. Magnetoelastic control of the arrays' magnetic states is provided by voltage-controlled uniaxial strain applied at to the anisotropy axis (an ultra-lower-power consumption technique17). We base our parameters for this on the artificial multiferroic heterostructure PMN-PT (011)/CoFeB where strains of up to 0.175% can be applied, leading to a strain anisotropy of comparable magnitude to typical growth-induced intrinsic anisotropies.18,19 Input is via voltage applied perpendicularly across the PMN-PT layer. An output derived from, for example, giant magnetoresistance would be proportional to the magnetization of the array [see Fig. 1(a)].
To model strain-mediated voltage control of the CoFeB nanodots, we express their energy using the Stoner–Wohlfarth model.20,21 The application of strain to isotropic, amorphous CoFeB results in the addition of a uniaxial magnetoelastic anisotropy term with magnitude , where λs is the magnetostriction coefficient [31–75 ppm (Refs. 19 and 22)] E is the Young's modulus [216 GPa (Ref. 23)] and ϵ is the voltage-induced strain.24 When combined with an intrinsic uniaxial anisotropy (K), the normalized energy per unit volume is then
where θ is the angle of the magnetization to the intrinsic anisotropy axis, is a fixed applied field, and , ρ are the angles of the strain anisotropy axis and field direction to the intrinsic anisotropy, respectively. The two anisotropies can be combined into a single term25 (see supplementary material), resulting in an anisotropy axis that is rotated by changing the magnitude of through application of strain [Fig. 1(b)].
Figure 1(c) shows how the energy barriers of the system evolve under tensile and compressive strains. Here, , and . The strain changes the energy profile such that the energy barriers are no longer equal for the two states. The superparamagnet's magnetization exhibits stochastic behavior, switching between the minimum energy states with a characteristic dwell time controlled by these energy barriers. Using the Néel–Arrhenius law, the switching rate from state i (magnetization right) to j (magnetization left) is
where is the attempt frequency [typically 109–1011 (Ref. 26)] kB is the Boltzmann constant, is the energy barrier from state i to j, is the reduced energy, and is the scaled thermal energy. The thermal switching of superparamagnets has previously been used for random number generation,27,28 adiabatic computing algorithms,29 and encoding nonlinear functions.30
This stochastic switching is illustrated in Fig. 1(d), which presents micromagnetic simulations (T = 300 K) of thermal switching between two ground states in a 40 nm diameter CoFeB cylinder performed using the MuMax331,32 software package. Changing the strain applied manipulates the dwell time of the magnetization in the two states: tensile (compressive) strain results in an increased dwell time in the right (left) magnetization state.
Figure 1(e) plots the probability of not switching as a function of time in a state for the tensile strain case. The decay rate of these exponential distributions gives the dwell time and switching rate for the state. Figure 1(f) compares rates extracted from the micromagnetic simulations (points) with those derived from energy barriers defined by the macrospin model [Eq. (1)]. A value of is selected such that the values of wij at zero applied strain match between the two models. The macrospin model agrees reasonably with the micromagnetic simulations; however, the data match less well at higher magnitudes of strain. The deviation occurs for values of (high) and (low) where the assumption of Néel–Arrhenius like switching between two states approaches the limit of its validity ().33 We do not calculate micromagnetic rates for all values of as the run time required for the simulations scales exponentially. However, we expect the agreement is good or better for larger where the assumption is always met. The macrospin model is, therefore, used to model the CoFeB dots throughout the remainder of this paper.
We now consider how the stochastic behavior of single nanodots manifests in an extended ensemble as predictable collective behavior. The average magnetization obeys simple rules that, when subject to a temporally varying input in the form of a globally applied strain, produce a reproducible, complex, nonlinear response with fading memory, thus fulfilling the basic requirements of a hardware-based reservoir.
We model the extended ensemble as a two-state system governed by the master equation,33
where pi is the normalized population in state i such that , and ωij is the transition rate as defined in Eq. (2). This model does not take into account the error introduced by a finite system size. However, it is appropriate for systems with a large number of nanodots (as we assume here) since the error scales as . Integrating Eq. (3) gives the time-dependent probability for fixed transition rates,
where . Given our system with a fixed applied field, the reduced magnetization (unit length) along the x direction (magnetization right) can be written as follows:
where δi refers to the slight rotation of the energy minimum away from the x axis due to the fixed field.
Figure 2(a) plots the equilibrium () response of reduced magnetization to an applied strain anisotropy (). The response is nonlinear; while the exact form depends on the scaled thermal energy (), it maintains sigmoidal like features that sharpen with increasing . Importantly, the curve remains nonlinear as required for reservoir computing. Figure 2(b) illustrates the complex response of the reduced magnetization to a temporal input sequence. Section A, where a single input is turned on and held, demonstrates excitation of the magnetization away from zero net magnetization. Section B, where no input is present, demonstrates decay behavior back toward zero net magnetization. In all other sections, applied strain inputs are held for a duration less than the zero-input base time (T0) of the system, and thus the system response exhibits both excitation and decay terms. The zero-input timescale scales as , a variation from nanoseconds to milliseconds for 10 to 50, which allows a reservoir to be designed with a timescale to match a particular real-time task (see the supplementary material for further details). The sharp transitions upon changes of input are due to the slight rotations of the minimum energy states within single dots (change in δi) and are expected to occur on a timescale much faster than the thermally driven response of the ensemble. Since depends on ), the system exhibits a fading memory: the response of the reservoir depends on the current and previous inputs.
Collective response from an ensemble of nanodots. (a) The equilibrium magnetization response of the reservoir to input strain anisotropy for 10 to 50. (b) Reservoir magnetization over time in response to an input sequence. Sections A and B show the excitation and decay responses of magnetization as described in Eqs. (4) and (5).
Collective response from an ensemble of nanodots. (a) The equilibrium magnetization response of the reservoir to input strain anisotropy for 10 to 50. (b) Reservoir magnetization over time in response to an input sequence. Sections A and B show the excitation and decay responses of magnetization as described in Eqs. (4) and (5).
Together, the reproducible response, nonlinearity, and fading memory of the superparamagnetic ensembles suggest they have the correct properties to be used as hardware reservoirs. Furthermore, as discussed by Hu et al.,17 the estimated power consumption for magnetoelastically driven magnetoresistive random access memory (MRAM) is 160 aJ per bit written (a rotation of ). Our system uses ∼1000 such bits, but less than 15% of the strain required for a rotation; this gives an approximate estimate of 24 fJ per input. This compares favorably to the write energy per bit of spin transfer torque random access memory (STT-RAM) of 100 fJ.17 While this figure is not precise, it indicates the potential for ultra-low-energy consumption. Furthermore, Safranski et al.34 have verified that uncorrelated thermally driven switching can be observed in CoFeB magnetic tunnel junctions (MTJs) for dwell times of 5 ns and above, which we can take as a floor for the operational speed of these nanodots. Longer dwell times can easily be induced by scaling up the size of the magnetic elements, thus increasing the energy barrier between states. Therefore, the timescale of the ensembles' responses could be easily engineered across many orders of magnitude to allow them to perform a wide range of real-time tasks.
Having established that strain-mediated, voltage-controlled superparamagnetic ensembles possess the qualities required for RC, we now test our system's performance against two common ML tasks. We treat the nanodot array as a single dynamical node with complex internal behavior and thus adopt the approach described by Appeltant et al.5 [Figs. 3(a)–3(d)].
Reservoir computing using the superparamagnetic ensemble. (a) Schematic of reservoir computing. (b) Raw input to the system; here, random numbers from the interval for the NARMA10 task. (c) A single input value from (b) is transformed by the mask into masked inputs. A snapshot is shown from just before to just after the raw data time step . Feedback is then added: the output from the each virtual node one time step (ζ) previously is multiplied by a feedback strength and added to the masked inputs. The raw output is the response of the reservoir to the masked input with feedback. (d) Output after trained linear layer compared to desired output. Here, the desired NARMA10 output from Eq. (10) is compared to the output provided by the trained reservoir. (e) Machine learning performance on speech recognition task. (f) Performance on NARMA10 task.
Reservoir computing using the superparamagnetic ensemble. (a) Schematic of reservoir computing. (b) Raw input to the system; here, random numbers from the interval for the NARMA10 task. (c) A single input value from (b) is transformed by the mask into masked inputs. A snapshot is shown from just before to just after the raw data time step . Feedback is then added: the output from the each virtual node one time step (ζ) previously is multiplied by a feedback strength and added to the masked inputs. The raw output is the response of the reservoir to the masked input with feedback. (d) Output after trained linear layer compared to desired output. Here, the desired NARMA10 output from Eq. (10) is compared to the output provided by the trained reservoir. (e) Machine learning performance on speech recognition task. (f) Performance on NARMA10 task.
The RC process is as follows. A raw input signal with dimensions is discretized such that giving a 2d matrix of size by . For the training part, each input sample has a corresponding desired output of dimension , forming a matrix of by . A mask matrix is used to project a random combination of the input dimensions onto each of the Nv virtual neurons and is constant for all samples of the input. For a 1d input signal, such as the NARMA10 task [Fig. 3(b)], the random nature of the mask is designed to stimulate a range of dynamics. For a sample index n and virtual neuron i, the total input voltage is
where represents a scaling factor to give an input voltage into the reservoir, γ is the feedback strength, and is the output value of the reservoir for virtual neuron i but for sample n − 1.
This matrix of input voltages is serialized as 1d time series with each virtual node presented in order before the applying the next input sample. This is demonstrated in Fig. 3(c), which shows the masked input, masked input with added feedback term, and finally the raw output of the reservoir at the transition point between samples 56 and 57. The masked input voltage for each virtual neuron is held constant for the neuron duration θ; thus, the reservoir time between each input sample is . The raw output of the reservoir is measured at the end of each virtual neuron duration to give a transformed value, which for the nth input sample and virtual neuron i is
The aim of RC is that transformed values from, e.g., different input classes are now linearly separable by a hyperplane due to the higher dimensional transformation. Thus, a set of linear output weights are used to classify the output. The predicted output of sample n and output dimension k is
where Wkj is the output weight matrix [Fig. 3(d)]. A constant bias term is included as the j = 0 element of the weights with an extra column in x as for all n. The output weights are found by using ridge (or Tikhonov) regression.35 This gives a direct solution for the weights as
where λ is the L2-norm regularization hyper-parameter and I the identity matrix. λ is found through a grid search to find the lowest mean square error which is sampled using fivefold cross validation of the training data set for each task to prevent overfitting.
We use two common machine learning tasks as proof of principle of the reservoir's operation: spoken digit recognition (TI-46 dataset) and chaotic time series prediction (NARMA10). These require a reservoir with good nonlinearity and memory, respectively. Performance heat maps for the spoken digit task are shown in Fig. 3(e). The task requires the recognition of the spoken digits 0–9 by five female speakers from the NIST TI-46 dataset.36 Preprocessing separates the data into 13 frequency bands using a Mel Frequency Cepstral Coefficient (MFCC) filter37 before it is masked. Training and validation was performed on 100 utterances per speaker, testing on 160. The input rate () and the scaling of the input (maximum ) are varied, and the error rate for the test dataset is plotted. 50 virtual nodes was found to give good performance. The left hand heat map is for an ensemble with a base timescale 1.4 μs (, nanodot diameter 57 nm), and the right hand heat map is for that with 66 ms (, nanodot diameter 90 nm). In both cases, an error rate of 5% can be achieved, showing that this reservoir can perform well at this task across a range of timescales. To provide a baseline, the error rate when the reservoir is removed and the masked input is treated as the output is 23%. The relatively high base level is due to the inherent nonlinearity in the preprocessing discussed by Abreu Araujo et al.38 However, the important demonstration here is the increase in performance with the reservoir, which is comparable to other hardware based reservoirs.5,8,38 The complexity of the maps, while not discussed in further detail here, may be due in part to coupling between input strength and internal timescale (see the supplementary material).
The goal of the chaotic time series task (NARMA10) is to predict the response to white noise (random numbers drawn from ) of a discrete-time tenth order nonlinear auto-regressive moving average,39,40
The task for the network is to predict the NARMA10 output yn given the input Sn. We can characterize the success of the network via the metric, Normalized Root Mean Squared Error (NRMSE),
where yn is the true output [Eq. (10)] and the predicted output from the network [Eq. (8)]. For this task, we make use of the delay line feedback [Fig. 3(a)] similar to the Mackey–Glass reservoir discussed by Appeltant et al.5 This delay line (an addition of the output at time to the input at time t) serves to boost the memory capacity of the system, an important requirement for the NARMA10 task. Figure 3(f) presents a heat map where NRMSE is plotted for varying feedback strength (γ) and intrinsic timescale (change in ). Previous optimization led to the use of 400 virtual nodes, input rate , and input scaling of 0.003. 2000 samples were used for training and validation (10-fold), 1000 for testing. It can be seen that good performance (around NRMSE , cf. Ref. 39) can be maintained from the μs to ms regime by varying the feedback strength. Figure 3(d) demonstrates a typical predicted and true signal for this level of NRMSE. As with the speech recognition task, we can compare to the value achieved when the reservoir is removed and the masked input taken as the output to be trained: a NRMSE . Using a Mackey–Glass reservoir (with optimization parameters based on those in Appeltant et al.5), a value of 0.50 was achieved. It should be noted that this value differs from that given by Appeltant; this is believed to be due to a differing definition of NRMSE.
While we have demonstrated the feasibility of this system for machine learning, we now address two of the key potential challenges in realizing it as an experimental device: first, reproducibility of the magnetic behavior. Whether due to deviations from the Stoner–Wohlfarth model, small pertubations from long-range magnetostatic coupling, or variation in PMN-PT response from nanodot to nanodot, the magnetic response curve may differ from that presented in Fig. 2. However, one of the chief advantages of the system envisaged is that the behavior does not have to be realized exactly in order to produce the behavior required for machine learning as long as the response of the system is nonlinear, is reproducible, and has fading memory. As the output is derived from an average across an ensemble, slight changes in behavior from nanodot to nanodot are averaged out across the whole array. This should give the device high tolerance to small deviations. This is in contrast to typical magnetic logic and memory devices where slight device-to-device variations can cause significant problems with reproducibility. We have also demonstrated, in Fig. 3(e), that high performance can be achieved for systems with differing magnetization response curves. This leads to the second consideration: how reproducible the device will be run-to-run. The reproducibility of the signal will largely depend on the size of the array, with the error scaling as . By increasing the size of the array, the error can be made arbitrarily small. It will be important to determine how large an error can be tolerated, and this will be an avenue of future research. We anticipate that the threshold of the noise tolerance will be problem specific. The tolerance of the system to changes in magnetic response and the increase in reproducibility due to the aggregate nature of the device should help to mitigate the key challenges of realizing a physical device.
In conclusion, we have proposed strain-mediated voltage-controlled superparamagnetic ensembles as platforms for hardware-based reservoir computing. We have demonstrated competitive outcomes on two machine learning benchmark tasks (spoken digit recognition and chaotic time series prediction) as well as explaining the properties of the ensemble which give rise to their ability to act as a reservoir. By tuning internal parameters (input rate, input scaling, and feedback strength), this performance can be delivered across a range of timescales, from μs to ms. This would allow a physical realization of such a reservoir to be tuned to provide computation in real time for a wide range of possible physical inputs: from decision-making in driverless cars (fast) to speech recognition (slow). The simplicity of the system, coupled with the low energy consumption for such a voltage-controlled, thermally driven device, makes it an ideal candidate for use in edge computing applications where high performance is needed at low latency and power.
See the supplementary material for a discussion of the full form of the anisotropy and the tunable behavior of the timescales.
The authors thank the Engineering and Physical Sciences Research Council (Grant No.: EP/S009647/1 and EP/V006339/1) for financial support. The project leading to this application has received funding from the European Union's Horizon 2020 research and innovation programme under Grant Agreement No. 861618 (SpinENGINE).
DATA AVAILABILITY
The data that support the findings of this study are openly available in ORDA, Ref. 41.