The neural network enables efficient solutions for Nondeterministic Polynomial-time (NP) hard problems, which are challenging for conventional von Neumann computing. The hardware implementation, i.e., neuromorphic computing, aspires to enhance this efficiency by custom hardware. Particularly, NP hard graphical constraint optimization problems are solved by a network of stochastic binary neurons to form a Boltzmann Machine (BM). The implementation of stochastic neurons in hardware is a major challenge. In this work, we demonstrate that the high to low resistance switching (*set*) process of a Pr_{x}Ca_{1−x}MnO_{3} (PCMO) based RRAM (Resistive Random Access Memory) is probabilistic. Additionally, the voltage-dependent probability distribution approximates a sigmoid function with 1.35%–3.5% error. Such a sigmoid function is required for a BM. Thus, the Analog Approximate Sigmoid (AAS) stochastic neuron is proposed to solve the maximum cut—an NP hard problem. It is compared with Digital Precision-controlled Sigmoid (DPS) implementation using (a) pure CMOS design and (b) hybrid (RRAM integrated with CMOS). The AAS design solves the problem with 98% accuracy, which is comparable with the DPS design but with 10× area and 4× energy advantage. Thus, ASIC neuro-processors based on novel analog neuromorphic devices based BM are promising for efficiently solving large scale NP hard optimization problems.

## I. INTRODUCTION

The conventional von Neumann computer based on deterministic CMOS logic implementation has been extremely successful in implementing sequential algorithms using clearly demarcated processing and memory units.^{1,2} However, there are many important problems such as graphical constraint optimization, factorization, and other Nondeterministic Polynomial-time (NP)-hard problems which do not have a polynomial time algorithm to find globally optimal solutions. A serial search through a large number of states leads to a memory and computing resource challenge. Hence, there has been growing interest to test alternative computing paradigms.^{3,4} Brain inspired artificial neural networks have shown immense promise for efficiently searching approximate solutions and have found wide applications in pattern recognition and optimization problems.^{5,6} Specifically, the Hopfield-Tank networks allow a parallel scan of possible network states and have been algorithmically shown to estimate solutions for the classical Traveling Salesman Problem (TSP).^{7}

More recently, Spiking Neural Networks (SNNs) were proposed as a biologically more accurate model, which use spikes in neurons as information carriers along with plastic connections called synapses.^{8} Apart from parallel communication through spikes, other features of the SNNs include stochastic spiking and^{9,10} refractory periods.^{11} These three biological features are useful in various ways, e.g., constraint optimization,^{11} enhanced learning,^{9} and sensing.^{10} Measured data from isolated retina of larval tiger salamander shows that the refractory period regulates the spike rate of the ganglion cells located near the inner surface of the retina.^{12} The refractory period of a neuron can also help escape local minima while performing stochastic optimization tasks using neural networks.^{11}

Parallel computation with realtime information exchange between neurons produces a communication bottleneck in von Neumann computers with separate logic and memory blocks connected with a bus. To fully realize the potential of these models, dedicated hardware and architecture have been proposed. An excellent review of neuromorphic algorithms and hardware for constraint satisfaction problems has been presented.^{13} Nanoscale devices have been used to further enhance the efficiency of these solutions. For example, coupled oscillators^{14–16} and memristor crossbar array based Boltzmann machines (BMs)^{17} have been explored.

The Boltzmann machine (BM) is an important class of neural network, where neurons are typically binary (i.e., spike is “1” and no spike is “0”). The *n* neurons in the network can choose the *i*th state out of 2^{n} states with a probability (*p*_{i}) that depends on the energy (*E*_{i}) of the state based on the Boltzmann distribution, i.e., *p*_{i} ∝ exp(−*E*_{i}). Constraint optimization problems are solved by mapping the constraints to network architecture and the energy function (*E*_{i}).^{18} Furthermore, neurons in the BM can be modeled as a Markov chain to solve the traveling salesman problem^{11,19} of the NP hard class of problems. The key to realizing a Markov chain based model lies in being able to generate random numbers according to a given function. A sigmoid function for the stochastic neuron model was proposed earlier.^{19}

Neuromorphic engineering aspires to implement such promising algorithms in hardware to enable performance, power, and area efficiency advantages. To enable BM in hardware, the challenge is to implement a stochastic neuron. On the other hand, analog synapses have been explored in detail as shown in the literature review.^{20} Various neuron designs have been explored in the literature.^{21} Circuits used to implement silicon neurons have been reviewed.^{22} An optimal mix of analog and digital design is required to achieve brainlike efficiency in computing.^{23} A low power analog LIF (Leaky Integrate and Fire) neuron using novel physics in traditional silicon-on-insulator Metal Oxide Semiconductor Field Effect Transistor (MOSFET) has been demonstrated^{24} but provides rather miniscule stochasticity.^{25}

The nanoscale devices like memristors show enhanced stochastic switching^{26–28} without requiring circuit based amplification of noise.^{29} Furthermore, analog matrix multiplication based on the memristor crossbar has been shown as significantly superior to the digital version for the Boltzmann machine.^{17} Various nanoscale device based stochastic neurons have been demonstrated. A combined neuro-synaptic core was proposed using a memristive magnetic tunnel junction device.^{30} The magnetization switching driven by spin-transfer torque in combination with back-hopping was used to demonstrate stochastic current spike generation. However, the switching resistance ratio was poor and stochasticity (without an external magnetic field support) was experimentally observed for only low temperatures (T ∼ 130 K) and very high current densities [100× > than PCMO RRAM (Resistive Random Access Memory)]. Another general-purpose weight storage element and stochastic neuron model was proposed using a TiO_{2} memristor.^{31} The resistive switching, similar to PCMO, was achieved through vacancy modulation; however, this device required electroforming step for operation and higher operating voltages, both of which negatively impact device variability and endurance. Detection of temporal correlations in parallel data streams was proposed using a stochastic phase change neuron in Ge_{2}Sb_{2}Te_{5} Phase Change Memory (PCM).^{32} The device resistance was a function of the crystalline vs amorphous phase thicknesses of the PCM layer and the uncertainty in melt-quench amorphization reset step was the primary source of stochasticity in neuron operation. However, the input to this neuron needed to be converted to a series of crystallization pulses (as opposed to a single fixed pulse for PCMO) which requires extra peripheral circuitry and renders low feasibility to network level integration. More recently, low barrier magnets have been proposed for stochastic switching in a 1T/1M arrangement.^{33–35} However, these devices have very stringent fabrication constraints of near-critical thickness magnetization layer or circular magnets for an absence of preferential magnetic orientation which are a challenge for nanoscale production and often require noise amplification inverters at the output. Yet another implementation using an electroforming free VO_{2} Mott memristor based stochastic neuron was demonstrated recently,^{36} but it required additional noise to the input for exhibiting stochastic behavior. Fidelity to neuronal models like stochastic Hodgkin Huxley^{37} and spike response models^{38} has also been demonstrated. With respect to applications, stochastic neurons have been used for enhanced sensing,^{32} training, and recognition of the logic function^{39,40} and on datasets such as MNIST, CIFAR, etc.,^{41–44} with promising energy benchmarks;^{45} however, the application to NP hard constraint optimization has not been explored.

Unlike stochasticity in filamentary RRAMs^{26} which produce binary states,^{46} PCMO (Pr_{x}Ca_{1−x}MnO_{3}) is a nonfilamentary RRAM to enable analog memory with forming-less operation and area scalable currents with good endurance and retention.^{47,48} Excellent low energy, analog PCMO synapses have been demonstrated.^{49–52} Integrate-and-fire (IF) neurons have been demonstrated based on the *set* (high to low resistance switching) process.^{53} Thus, PCMO RRAM provides a materials system, which provides both analog synaptic and IF neuronal functionality. However, the stochastic switching of PCMO based RRAM and its utilization in stochastic neurons have not been presented earlier.

In this work, we present a stochastic neuron based on PCMO RRAM for a BM to solve an NP hard problem, i.e., Maximum Cut (or Max-Cut). First, we experimentally show that PCMO RRAM has approximately sigmoid switching probability with voltage. We utilize the natural analog approximation of sigmoid stochasticity to design a compact neuron. A comparison with digital precision-controlled sigmoid stochasticity is presented with purely CMOS as well as CMOS with integrated RRAM implementations. We have considered 65 nm CMOS technology for the current document; however, the analysis is fairly general. We show that the networks sample from the Boltzmann distribution approximately. We compare the performance in terms of accuracy of solution of Max-Cut. Finally, we present the area and power benefits.

## II. BOLTZMANN MACHINE ALGORITHM

A BM is a fully connected network of *n* binary neurons [Fig. 1(c)] described in the literature.^{11} A weight is associated with each connection and a bias associated with each neuron. The state of the network can be expressed as a binary vector which represents neuron in binary states, i.e., on (“1”) or off (“0”). Such a state $x\u2192$ (among 2^{n} possible states) occurs with a probability given by

where

Here, *Z* is the normalization factor, *b*_{i} is the bias of the *i*th neuron, *x*_{i} is the state of the *i*th neuron (0/1), and *w*_{ij} is the weight of the connection between *i*th and *j*th neurons. Thus, the input (*u*_{i}) to the *i*th neuron is the change in network energy caused by its switching, which is given by the following:

The crossbar array shown in Fig. 1(c) functions to compute this sum and feed it to the neuron. The weights are summed along a column by Kirchhoff’s law of networks as input to the neuron, along with a self-bias current. The clocked implementation is shown in Fig. 1(d) where the input from digital neurons is converted to analog current through memristors, summed through the crossbar to generate an analog *u*_{i}, which is used by the stochastic neuron to issue digital spikes.^{17} Equation (1) indicates that the BM will visit the lowest energy state the most frequently. Thus, we need to map the cost of an optimization problem to the energy of the network so that the most frequently visited state is the minimum cost solution. The daunting class of NP-hard problems is challenging to solve in the serially processed von Neumann computing approach. However, BM in hardware provides a way to exchange information between all neurons in parallel. The parallel information processing in BMs may provide more efficient solutions.

### A. Markov chain model for stochastic neuron

Neuron’s operation may be represented as a Markov chain,^{11} as shown in Fig. 2(a). In state 0, the output is 0. Here, the neuron accepts input *u* to *stochastically* decide whether to transition to the state *τ* or stay in the same state. The transition to state *τ* occurs with sigmoid probability dependence on input *u*, i.e., *σ*(*u*). Hence, the neuron stays in the same state with probability 1 − *σ*(*u*). Once a neuron has reached the state *τ*, it *deterministically* transitions to the next states until it reaches state 1. In all these states from state *τ* to state 1, the output remains 1. On reaching state 1, the neuron again *stochastically* decides to transition to state *τ* [with probability *σ*(*u*)] or state 0 [with probability 1 − *σ*(*u*)] depending on the input *u*. Given such a neuron, the network visits states such that it samples from a Boltzmann distribution.^{19}

### B. Network definition for the Max-Cut problem

Next, we describe the solution of Max-Cut, which is one of the first problems to be demonstrated as NP hard and has many practical applications, e.g., resource maximization in networks.^{54} In solving the weighted Max-Cut problem on a graph, the aim is to cut the graph in two parts such that the sum of edge weights crossing between the two parts is maximized. To do this, the problem will be represented in terms of the BM network. The BM occupies a state with probability inversely proportional to the exponential of its energy as indicated by Eq. (1), i.e., lowest energy states are visited most frequently. So, we need to define the energy associated with a cut such that it is inversely proportional to the cost. The problem can be formally stated as follows.^{18}

Given a graph G = (V, E) with weighted edge set E and vertex set V, find a partition of vertices into disjoint sets *S* and *S*′ so that the cost function $f(x\u2192)$ defined in Eq. (4) is maximized [Figs. 1(a) and 1(b)],

where

The above cost function is rearranged in (5) to highlight its similarity to energy of a Boltzmann network shown in (2),

Therefore, if biases and weights of a Boltzmann network are defined as (6) and (7), respectively, then the cost of the Max-Cut can be mapped to the energy of the BM,

Once the Max-Cut problem is mapped to the network representing the BM, the BM will settle into the solution where *x*_{i} will take values of 0 or 1, indicating whether they are in S or S’ [Fig. 1(c)]. The most visited state will be the solution presented by the network.

## III. HARDWARE IMPLEMENTATION OF BOLTZMANN MACHINE

The hardware SNN (Stochastic Neural Network) based BM consists of the synapses and neurons. Analog hardware synapses in crossbars [Fig. 1(c)] have been extensively investigated.^{20} Hence, we focus on the neuron in this paper. The Markov chain neuron model is shown in Fig. 2(a), and the corresponding hardware implementation is shown in Fig. 2(b). The deterministic transitions in the Markov chain model are replicated using a down counter block. While the down counter is enabled, the state of the neuron (represented by “count”) is obtained from the decrementer. Here, the count decreases by 1 after each clock cycle. This deterministic countdown of *τ* steps acts as the refractory period since the neuron output remains high and no new spike can occur during that time. The condition select block puts various checks on the current state and decides two things depending on the outcome of these checks—(i) output of the neuron and (ii) enable signals of other two blocks, i.e., down counter and stochastic *set* block. The output of the neuron is 0 when count is 0, while it is 1 for all other counts. The down counter is enabled when count is greater than 1, i.e., during the refractory period. Otherwise, the stochastic *set* block is enabled. Here, input *u* for a neuron controls stochastic switching in the stochastic function *SF* sub-block to produce a sigmoid switching probability.

Two possibilities have been considered for the stochastic function (*SF*) sub-block—(i) pure CMOS and (ii) hybrid (RRAM integrated with CMOS) designs. First, we use a pure CMOS design based Linear Feedback Shift Register (LFSR) to generate pseudorandom numbers on clock—which consists of large digital blocks. Second, we use the hybrid design, where the stochastic switching in the compact RRAM is utilized by a CMOS design. Hence, we will study stochastic RRAM switching experimentally in Sec. III A.

### A. PCMO RRAM stochasticity

#### 1. PCMO RRAM device experimental setup

The Pr_{x}Ca_{1−x}MnO_{3} (x = 0.7) based RRAM devices were fabricated on a 4″ Si substrate. The bottom electrode of Ti (50 nm)/Pt (25 nm) is deposited by sputtering on thermally grown SiO_{2} (30 nm). Furthermore, PCMO (65 nm) is deposited by the RF sputtering process at room temperature. Then, the PCMO is crystallized by annealing the sample in N_{2} ambient at 650 °C by rapid thermal annealing. After that, the devices were obtained by defining via holes of 1 *μ*m in SiO_{2} by electron beam lithography (EBL). Finally, tungsten (W) contact pads (25 *μ*m × 25 *μ*m) are created by sputtering and lift-off of tungsten.^{55} The device schematic [Fig. 3(a)] shows the PCMO sandwiched between W and Pt. For the characterization, the voltage is applied between the W and Pt. The DCIV is taken using the Agilent B1500 semiconductor parameter analyzer’s Source Measure Unit (SMU) and transient IV by the Waveform Generator/Fast Measure Unit (WGFMU). All the measurements are carried out at room temperature.

#### 2. Physics of stochasticity

The typical DC IV characteristics of PCMO RRAM are shown in Fig. 3(b). At low bias, the device does not change its resistance state. The trap density (N_{T}) dependent space charge limited current (trap-SCLC) flows through the device^{56} where I_{trapSCLC} ∝ 1/*N*_{T}. On the application of positive polarity voltage exceeding a voltage threshold, the device switches from a low resistance state (LRS) to a high resistance state (HRS). This is the *reset* operation. Similarly, on the negative polarity exceeding a threshold voltage, the device switches from HRS to LRS. This is the *set* operation. This *set* operation and *reset* operation in the device are attributed to the movement of oxygen ions to and from the PCMO thin film toward the tungsten electrode [Fig. 3(c)], which modulates the resistance through trap density change. The process is described briefly below.

In the *reset* operation, the oxygen ions (O^{2−}) move from bulk toward the tungsten (W) electrode at positive polarity. This O^{2−} egress from PCMO creates oxygen vacancies in the PCMO (i.e., increases the N_{T}). The movement of ions can be given by the Mott-Gurney equation [Eq. (8)],

where *a* is the hopping distance, *f* is the escape frequency, *E*_{m} is the activation barrier, *E* is the electric field, and *E*_{0} = *kT*/*qa* is the characteristic electric field.

The increase in N_{T} in the device increases the resistance and hence leads to reduction in the current which is consistent with the trap-SCLC behavior [Eq. (9)]. The SCLC current depends upon *N*_{T}, *T*, and voltage (*V*) [i.e., *I*_{SCLC}(*N*_{T}, *T*, *V*)],

where *N*_{V} is the effective density of states of the valence band, *E*_{T} is the trap energy level, *E*_{V} is the valence band energy level, and *k*_{B} is the Boltzmann constant. As the voltage is increased further, N_{T} keeps increasing leading the device into higher resistance states.^{57,58}

During *set*, as the negative bias is at W, the oxygen ions (O^{2−}) move away from the electrode and into the PCMO bulk. These ions annihilate the oxygen vacancies in the device, leading to reduction in N_{T} and hence decrease in resistance. As the resistance is decreased, more current flows through the device. The increase in current leads to Joule heating in the device to further enhance the ionic motion [Eq. (11)]. The device temperature (*T*) is a function of current and voltage [i.e., *T*_{device}(*V*, *I*)],

where *k* is the thermal conductivity of PCMO, *C*_{V} is the specific heat capacity, and volume = area × thickness.

The ionic motion reduces trap density which further increases current. Thus, a positive feedback is developed between current, temperature, and ions, which leads to current shoot-up until a compliance is reached.^{57} The *set* dynamics flowchart [Fig. 3(d)] shows the positive feedback loop between heat transport (I → *T*), ionic transport (T → *N*_{T}), and the electron transport (N_{T} → *I*). Here, both the heat and electron transport [Eqs. (11) and (9), respectively] are deterministic processes, whereas the ionic transport [Eq. (8)] is stochastic in nature, as indicated in Fig. 3(d). The stochasticity in the ionic transport comes from the hopping probability associated with the oxygen ions. The transport of a few ions modifies the potential profile locally for current transport and modulates the DC current and related heating. In the *set* process, the transport of a few ions may initiate positive feedback of current and heating locally. This local hot spot may spread to the entire PCMO layer—producing a stochastic *set* process. This leads to the stochastic nature of current switching when observed in the transient measurements [Fig. 3(b)]. The probability of switching is voltage-dependent, i.e., it is zero at low bias and increases and saturates to 1 at high bias.

### B. Implementation of stochastic function (*SF*) block

The stochastic function (*SF*) block in the block diagram [Fig. 2(b)] is the most challenging element of the neuron. We compare the three different implementations shown in Fig. 4.

The first two designs are a Digital Precision-controlled Sigmoid (DPS) implementation, as shown in Fig. 4, to enable high bit-precision based replication of sigmoid using a Lookup Table (LUT). Both designs require an input preprocessing stage, where the analog input signal from the crossbar array of weights (*u*) is sampled by ADC. For a Pure CMOS based implementation [Fig. 4(a)], the digital signal is then processed through the Lookup Table (LUT), which outputs a threshold value. An LFSR generates a pseudorandom number. In the readout stage, the comparison of the LFSR output with the LUT output determines whether the neuron has spiked or not, i.e., if LFSR output exceeds the LUT output, then the neuron has spiked, else not. For a DPS hybrid scheme [Fig. 4(b)], the LUT translates the input of the neuron to a digital voltage value to be applied to the RRAM to enable stochastic switching corresponding to the probability *σ*(*u*). This digital voltage value is converted by the DAC to a voltage to be applied to the RRAM. This produces a high/low output in readout stage depending on the state of RRAM (low or high resistance state). If the RRAM has switched to low resistance, a *reset* bias is applied to *reset* it during the countdown and get it prepared for the next switching.

The implementations described above fail to utilize an important property of PCMO RRAM, i.e., the approximately sigmoidal switching probability. Alternatively, we implement the Approximate Analog Sigmoid (AAS) schemes that the naturally approximate sigmoidal switching of RRAM is utilized directly [Fig. 4(c)]. Here, we directly give the input voltage (*u*) from the crossbar to the RRAM such that *σ*(*u*) based probabilistic switching is obtained. It requires a scale-and-shift operation, which is managed by the operational amplifier that is present in all 3 designs. Thus, the ADC and DAC operations are avoided. The stochastic switching and state readout stages are identical to the second (i.e., Hybrid DPS) design.

## IV. EXPERIMENTAL RESULTS

### A. Nanoscale stochastic switching element: RRAM data measurement

As discussed in Fig. 3, the PCMO RRAM enables a voltage dependent probabilistic current switching during set. To experimentally study this stochasticity, the transient current is measured for a given set pulse amplitude to observe the switching time, i.e., the time at which current shoots up. The transient current is repeatedly measured during consecutive set measurements by alternatively applying a reset and a set pulse of duration 1 ms. One such example is shown in Fig. 5(a). The stochastic switching in time can be observed for a fixed set voltage pulse (−2.2 V/1 ms) for 100 runs. The current shoots up at different time instants for different set runs giving a probability distribution in the switching time. This stochasticity is further modulated by the applied voltage pulse amplitude and duration. We plot the cumulative probability distribution (CDF) of switching time in Fig. 5(b) for different applied voltages. When the voltage pulse amplitude is very high [>2.4 V, orange and red curves in Fig. 5(b)], the switching is deterministic, i.e., the time to set is a very narrow distribution in time. As the voltage pulse amplitude is decreased [<2.4 V, green and blue curves in Fig. 5(b)], the device switching becomes more stochastic, i.e., the time to set is a broad distribution in time. The probability of switching of RRAM by a pulse of a specific amplitude and pulse width is extracted out of the experimental CDF data by interpolation to generate the contour plot shown in Fig. 5(c).

We plot the switching probability as a function of pulse amplitude for three different pulse widths in Fig. 6(a) using the contour plot. Figure 6(b) shows that the RRAM switching probability function at a fixed pulse width closely resembles a sigmoid function after a linear transformation to the voltage axis. In Fig. 7(a), we have shown how analog *u* is converted to stochastic digital spikes through an RRAM. The DPS implementation needs to convert analog *u* to digital form through an ADC. The error reduces as bit precision increases for DPS in the inset of Fig. 7(b). In comparison, the AAS implementation needs no such conversion. It applies the analog *u* to the RRAM to obtain digital stochastic switching where the probability of switching is modulated in an analog fashion. Although the probability of switching is analog, the RRAM switching is digital because the switching from high to low resistance is abrupt with a large (10x) decrease in resistance—akin to digital low (“0”) to high (“1”) state. A voltage divider is designed with R_{read} in series to ensure that the voltage change is compatible with the buffer in Fig. 4(b). The neuron is also stochastic as the RRAM has probabilistic switching. Thus, the RRAM input is an analog voltage, the switching is digital, but the dependence of the probability of switching on input voltage is analog—which mimics a sigmoid. That is why this implementation is termed approximate analog sigmoid.

The pulse width gives us control over reducing the error between the functional form of the probability vs voltage amplitude curve and an exact sigmoid function. Given various pulse-widths, we observe a minimum rms error of 1.21% compared to an ideal sigmoid for a pulse width of 450 ns [Fig. 7(b)]. Thus, we can implement a near-sigmoid probability function using an RRAM by linearly transforming the input membrane potential, u. This transformation can be accommodated as a part of the weight scaling in the crossbar array. The resultant voltage can then be applied as a pulse to the RRAM to implement approximate sigmoid probability activation, thus eliminating the need for lookup tables and preprocessing blocks. The combined implementation of sigmoidal stochasticity vs applied pulse amplitude to PCMO RRAM for a fixed pulse width is the key enabler for AAS architecture.

As discussed earlier in Sec. III A 2 titled “Physics of Stochasticity,” the origin of stochasticity is based on the motion of a few ions to kick-start the positive feedback process that produces the HRS to LRS transition. Hence, device area scaling will reduce the number of ions required to be transported and enhance number fluctuation. This requires further investigation.

### B. Performance, Area, and Energy Consumption

As mentioned earlier, the network of neurons should sample from Boltzmann distribution. Figure 8(a) shows a small sample of joint distribution of 4 neurons (i.e., 2^{4} cases) of a 10 neuron system with 2^{10} states. For comparison, we consider the result obtained from Gibbs sampling as the baseline to show that the network is closely sampling from the Boltzmann distribution. However, the ultimate demonstration is the solution of an NP hard problem, which is presented below.

The solutions of the Max-Cut problem are compared using the relative cost metric. The relative cost is defined as the ratio of the solution given by the network in a run to the optimal solution. Thus, a relative cost closer to 1 would mean a better solution. The performance on the 125 node Max-Cut solution is evaluated by simulations for the three schemes: AAS and DPS schemes (hybrid and pure CMOS). The DPS scheme performance was presented for different bit-precision cases defined as (u-bit, o-bit). Here, u-bit is the resolution of the input (*u*) and output bit (o-bits) defines the probability resolution for the pure CMOS scheme or voltage-resolution for the hybrid scheme. Thus, 5 cases were simulated: (4, 4), (5, 6), (6, 6), (6, 8), (8, 8).

To evaluate the circuit density, standard designs of digital components^{59} are used for estimation at 65 nm technology, while representative mixed signal component performances for ADC and DAC (in the same technology) are directly taken from the literature^{60,61} for a 1 MHz circuit operation for all three cases. A circuit area is estimated for the circuit design.

For digital components, the switching power is computed using the expression

where *α* is the switching probability of a transistor, *f* is the operating frequency, *C* is the output load capacitance, and *V* is the supply voltage. 0.5*CV*^{2} is the energy dissipation in a single high-to-low or low-to-high transition. For the ADC and DAC, the powers and conversion times have been taken from Refs. 60 and 61, which are state of the art in the literature with respect to the energy-per-bit and the typical resolution.

Figure 8(b) shows the performance of network vs area of circuit comparison. The solid line indicates the performance of the Goemans-Williamson algorithm for Max-Cut. The performance reduces with bit precision reduction for DPS designs as the neuronal area reduces—indicating a trade-off. The DPS neuron’s area is dominated by the mixed signal components (ADC and DAC)—whose sizes are related to the precision. Thus, the reduction in size comes at the cost of reduced performance. The hybrid scheme appeared more resilient than the pure CMOS scheme. For the AAS scheme, the three different pulse-widths (in the 300–450 ns range) producing different errors (in the range of 1.5%–3%) compared to ideal sigmoid were simulated. The accuracy of the sigmoidal approximation of stochasticity depends upon the choice of pulse time. Figure 7(b) shows minima in accuracy at 450 ns which is 1.83× smaller than 350 ns. For the AAS scheme, keeping the pulse width 350 ns instead of 450 ns gives 1.32× energy reduction. This reduction occurs due to a decrease in energy dissipation through RRAM. However, it is insignificant for hybrid DPS where preprocessing is dominant. The BM performance increased with sigmoidal error reduction. The AAS design has the same area for 3 different pulse-widths, while the related error determines performance. Furthermore, we compare the performance with the Goemans-Williamson algorithm^{62} that guarantees relative cost better than 0.87× of the optimal. The AAS and high-precision DPS results are significantly better than this lower bound. Overall, the AAS design occupies 1/10th the area compared to both the DPS designs at greater than equivalent performance of 98%–100%. In fact, an AAS circuit at 450 ns pulse width performs even better than both the (8,8) bit precision DPS circuits which occupy close to 50× more area.

In terms of energy per spike for a neuron, the energy dissipated by various components and the total energy dissipated are shown in Figs. 9(a) and 9(b). The direct RRAM scheme dissipates 1/4th the energy of DPS-pure CMOS scheme and 1/9th the energy of DPS-hybrid scheme. This energy saving is attributed to the elimination of the preprocessing stage which includes the ADC and DAC components. While present results are promising, the effect on the nature of stochasticity due to scaling PCMO RRAM needs to be evaluated to estimate the effect on system performance. Furthermore, device-to-device variability in the stochasticity of the neuron may have a significant impact on performance to require device-system co-design.

## V. CONCLUSION

In this paper, we study stochastic neuron design for the BM to solve classic NP hard problems, which are of great theoretical interest and with a wide range of practical applications. We show that the PCMO RRAM has an approximately analog sigmoidal (AAS) stochastic switching experimentally. We utilize this property to design stochastic neurons to solve a Max-Cut problem—a typical NP hard problem. A comparison to the digital precision-controlled scheme (DPS) by pure and hybrid CMOS is performed. All schemes perform better than heuristic Goemans-Williamson algorithm limits. We show that the AAS scheme has a 4× power and 10× area reduction over DPS schemes as well as the origins of the improvement. Thus, PCMO RRAM based stochastic neurons are highly promising for hardware BMs—an example of stochastic neuromorphic computing, to solve NP hard problems, which are extremely challenging for von Neumann computing.

## ACKNOWLEDGMENTS

The work was partially funded by DST Nano Mission and Ministry of Electronics and IT (MeitY). It was performed at the IIT Bombay Nanofab Facility. S.L. was funded by Intel Ph.D. Fellowship and Visveswaraya Fellowship. V.S. was funded through the Prime Minister’s Research Fellowship (PMRF).

## REFERENCES

_{2}Mott memristors for analogue computing

_{2}-based memristive devices with identical initial memory states

_{2}memristor and its usage in neuromorphic system design

_{2}active memristor neurons

_{2}O

_{3}/TiO

_{2-x}/Pt memristors

_{2}-based OxRAM devices as synapses for convolutional neural networks

_{0.7}Ca

_{0.3}MnO

_{3}devices

_{0.7}Ca

_{0.3}MnO

_{3}based RRAM

_{1-x}Ca

_{x}MnO

_{3}-based selector-less RRAM and its effect on memory performance

_{0.7}Ca

_{0.3}MnO

_{3}based resistive RAM

_{0.7}Ca

_{0.3}MnO

_{3}-based RRAM

^{2}4.9 fJ 10-bit 2 MS/s SAR ADC in 65 nm CMOS