In-memory computing (IMC) has emerged as a new computing paradigm able to alleviate or suppress the memory bottleneck, which is the major concern for energy efficiency and latency in modern digital computing. While the IMC concept is simple and promising, the details of its implementation cover a broad range of problems and solutions, including various memory technologies, circuit topologies, and programming/processing algorithms. This Perspective aims at providing an orientation map across the wide topic of IMC. First, the memory technologies will be presented, including both conventional complementary metal-oxide-semiconductor-based and emerging resistive/memristive devices. Then, circuit architectures will be considered, describing their aim and application. Circuits include both popular crosspoint arrays and other more advanced structures, such as closed-loop memory arrays and ternary content-addressable memory. The same circuit might serve completely different applications, e.g., a crosspoint array can be used for accelerating matrix-vector multiplication for forward propagation in a neural network and outer product for backpropagation training. The different algorithms and memory properties to enable such diversification of circuit functions will be discussed. Finally, the main challenges and opportunities for IMC will be presented.

Data-intensive computing tasks, such as data analytics, machine learning, and artificial intelligence (AI), require frequent access to the memory to exchange data input, output, and commands. Since the high-density memory is generally off-chip with respect to the central processing unit (CPU), data movement represents a significant overhead in the computation, largely exceeding the energy required for on-chip digital data processing.1,2 There are two possible directions to tackle this memory bottleneck: one is the optimization of the data throughput in a multi-chip approach, such as the high bandwidth memory (HBM)3 or the hybrid memory cube (HMC).4 The second approach is to radically change the computing paradigm by enabling in situ computation of data within the memory, which goes by the name of in-memory computing (IMC).5–8 

Various concepts of IMC have been proposed depending on the degree of integration of memory and processing, as illustrated in Fig. 1. On the one hand, a conventional von Neumann architecture depicted in Fig. 1(a) has physically separate memory and computing unit sitting on distinct chips, where the movement of input/output/instructions causes significant latency and excess energy consumption. One solution to mitigate these issues is the concept of near-memory computing (NMC) shown in Fig. 1(b), where the embedded nonvolatile memory (eNVM) is integrated on the same chip as the computing unit to minimize the latency.9,10 Note that eNVM serves as pure data storage for parameters and instructions in NMC, while the static random access memory (SRAM) is used as a cache memory storing intermediate input/output data. A further degree of integration consists of the true IMC approach shown in Fig. 1(c), where the SRAM is used directly as a computational engine, e.g., to accelerate matrix-vector multiplication (MVM).8 An additional overhead is the need to move the computational parameters from the local eNVM [or an off-chip dynamic random access memory (DRAM)] to the volatile SRAM every time the computation is needed. To mitigate this drawback, the ultimate concept to maximize the integration of memory and processing is IMC within the eNVM, as shown in Fig. 1(d).7 This approach appears as the most promising concept to minimize the data movement, hence energy consumption and latency, although there are significant challenges and trade-offs in terms of throughput, energy efficiency, and accuracy of the processing. Emerging memories represent a promising approach for eNVM in IMC, given several attractive properties of scaling, 3D integration of back-end processing, and nonvolatile storage of computing parameters. The interplay of device technologies, circuit engineering, and algorithms thus requires a strong effort in terms of co-design across multiple disciplines.11 

FIG. 1.

Various degrees of integration between memory and computing units. (a) von Neumann architecture for computing systems, where the central processing unit and the memory unit are physically separated and connected through a data bus. (b) Near-memory computing architecture, where the processing unit is complemented with an eNVM unit to store commands and parameters. (c) SRAM-based in-memory computing architecture, where computation is performed directly within the SRAM unit via dedicated peripherals, while eNVM serves as storage for computational parameters. (d) eNVM-based in-memory computing architecture, where eNVM provides both the nonvolatile storage of computational parameters and the computation.

FIG. 1.

Various degrees of integration between memory and computing units. (a) von Neumann architecture for computing systems, where the central processing unit and the memory unit are physically separated and connected through a data bus. (b) Near-memory computing architecture, where the processing unit is complemented with an eNVM unit to store commands and parameters. (c) SRAM-based in-memory computing architecture, where computation is performed directly within the SRAM unit via dedicated peripherals, while eNVM serves as storage for computational parameters. (d) eNVM-based in-memory computing architecture, where eNVM provides both the nonvolatile storage of computational parameters and the computation.

Close modal

This Perspective provides an overview of IMC, including the status of the memory device technologies and the circuit architectures for a broad portfolio of applications. Section II describes the state-of-the-art memory devices for IMC, including both two-terminal and three-terminal emerging memory technologies. Section III presents the concept of analog IMC, highlighting the main challenges from a memory array point of view. Section IV addresses matrix-vector multiplication, which is a fundamental computing primitive at the basis of most IMC applications. Section V reviews the state-of-the-art of closed-loop IMC, which enables highly complex algebraic operations with reduced complexity. Section VI presents an overview of the field of content-addressable memories. Section VII focuses on accelerators for the training of neural networks based on in-memory outer product. Section VIII addresses brain-inspired neuromorphic computing leveraging device physics to reproduce neurobiological processes of sensing and learning. Finally, Sec. IX provides an outlook on the next urgent challenges and opportunities that need to be addressed.

Charge-storage memories based on the complementary metal-oxide-semiconductor (CMOS) technology provide the mainstream memory technology for digital computing systems. Figure 2 illustrates the memory hierarchy of CMOS-based computing systems, including (from top to bottom) on-chip registers and static random access memory (SRAM), followed by off-chip dynamic random access memory (DRAM) and nonvolatile Flash storage. While performance (e.g., access time) decreases from top to bottom, the area density and cost decrease from bottom to top, with NAND flash representing the highest density thanks to 3D integration.12,13 Within this scenario, emerging memories based on material storage have been developed in an effort to provide a better trade-off between performance, area, and cost. In particular, emerging memory devices show unique storage principles relying on the physics of the active materials and offer advantages in terms of scalability,14 integration in 3D structures,15,16 and energy efficiency. These properties are also attractive for application as embedded memories in systems-on-chip, where flash memory faces additional integration difficulties due to the high-κ/metal-gate process of the silicon front-end circuits.17 Emerging memories have also attracted a considerable interest for IMC applications thanks to the nonvolatile storage of computing weights, high density, and fast programming/read. Figure 3 shows a summary of the main emerging memories, including two-terminal and three-terminal devices. Table I shows a summary of the properties of emerging memories compared to other nonvolatile memory technologies.18 

FIG. 2.

Schematic illustration of the memory hierarchy in traditional CMOS-based computing systems. Registers and cache memories have relatively fast access and low capacity. Moving away from the CPU (top), memories increasingly display slower access and larger capacity. The storage class memory can bridge the gap between high-performance working memory and low-cost storage devices.

FIG. 2.

Schematic illustration of the memory hierarchy in traditional CMOS-based computing systems. Registers and cache memories have relatively fast access and low capacity. Moving away from the CPU (top), memories increasingly display slower access and larger capacity. The storage class memory can bridge the gap between high-performance working memory and low-cost storage devices.

Close modal
FIG. 3.

Schematic illustration of the emerging memory technologies considered for IMC, including both two-terminal and three-terminal devices. (a) Resistive random access memory (RRAM). (b) Phase change memory (PCM). (c) Ferroelectric resistive random access memory (FeRAM). (d) Spin-transfer torque magnetic random-access memory (STT-MRAM). (e) Ferroelectric field-effect transistor (FeFET). (f) Spin–orbit torque magnetic random-access memory (SOT-MRAM). (g) Electro-chemical random access memory (ECRAM). (h) Memtransistor based on the MoS2 channel.

FIG. 3.

Schematic illustration of the emerging memory technologies considered for IMC, including both two-terminal and three-terminal devices. (a) Resistive random access memory (RRAM). (b) Phase change memory (PCM). (c) Ferroelectric resistive random access memory (FeRAM). (d) Spin-transfer torque magnetic random-access memory (STT-MRAM). (e) Ferroelectric field-effect transistor (FeFET). (f) Spin–orbit torque magnetic random-access memory (SOT-MRAM). (g) Electro-chemical random access memory (ECRAM). (h) Memtransistor based on the MoS2 channel.

Close modal
TABLE I.

Comparison of different memory technologies suited for in-memory computing. Reproduced with permission from D. Ielmini and S. Ambrogio, Nanotechnology 31(9), 092001 (2020). Copyright 2020 Author(s), licensed under a Creative Commons Attribution 4.0 License.

TechnologyNOR flashNAND flashRRAMPCMSTT-MRAMFeRAMFeFETSOT-MRAMLi-ion
On/off ratio 104 104 10–102 102–104 1.5–2 102–103 5–50 1.5–2 40–103 
Multilevel operation 2 bit 4 bit 2 bit 2 bit 1 bit 1 bit 5 bit 1 bit 10 bit 
Write voltage (V) <10 10 <3 <3 <1.5 <3 <5 <1.5 <1 
Write time 1–10 µ0.1–1 ms <10 ns ∼50 ns <10 ns ∼30 ns ∼10 ns <10 ns <10 ns 
Read time ∼50 ns ∼10 µ<10 ns <10 ns <10 ns <10 ns ∼10 ns <10 ns <10 ns 
Stand-by power Low Low Low Low Low Low Low Low Low 
Write energy [J/bit] ∼100 pJ ∼10 fJ 0.1–1 pJ 10 pJ ∼100 fJ ∼100 fJ <1 fJ <100 fJ ∼100 fJ 
Linearity Low Low Low Low None None Low None High 
Drift No No Weak Yes No No No No No 
Integration density High Very high High High High Low High High Low 
Retention Long Long Medium Long Medium Long Long Medium ⋯ 
Endurance 105 104 105–108 106–109 1015 1010 >105 >1015 >105 
Suitability for DNN training No No No No No No Moderate No Yes 
Suitability for DNN inference Yes Yes Moderate Yes No No Yes No Yes 
Suitability for SNN applications Yes No Yes Yes Moderate Yes Yes Moderate Moderate 
TechnologyNOR flashNAND flashRRAMPCMSTT-MRAMFeRAMFeFETSOT-MRAMLi-ion
On/off ratio 104 104 10–102 102–104 1.5–2 102–103 5–50 1.5–2 40–103 
Multilevel operation 2 bit 4 bit 2 bit 2 bit 1 bit 1 bit 5 bit 1 bit 10 bit 
Write voltage (V) <10 10 <3 <3 <1.5 <3 <5 <1.5 <1 
Write time 1–10 µ0.1–1 ms <10 ns ∼50 ns <10 ns ∼30 ns ∼10 ns <10 ns <10 ns 
Read time ∼50 ns ∼10 µ<10 ns <10 ns <10 ns <10 ns ∼10 ns <10 ns <10 ns 
Stand-by power Low Low Low Low Low Low Low Low Low 
Write energy [J/bit] ∼100 pJ ∼10 fJ 0.1–1 pJ 10 pJ ∼100 fJ ∼100 fJ <1 fJ <100 fJ ∼100 fJ 
Linearity Low Low Low Low None None Low None High 
Drift No No Weak Yes No No No No No 
Integration density High Very high High High High Low High High Low 
Retention Long Long Medium Long Medium Long Long Medium ⋯ 
Endurance 105 104 105–108 106–109 1015 1010 >105 >1015 >105 
Suitability for DNN training No No No No No No Moderate No Yes 
Suitability for DNN inference Yes Yes Moderate Yes No No Yes No Yes 
Suitability for SNN applications Yes No Yes Yes Moderate Yes Yes Moderate Moderate 

Figure 3(a) schematically shows the resistive random-access memory (RRAM), consisting of a metal–insulator–metal (MIM) stack where the insulating layer serves as the active switching material. The bottom electrode (BE) typically consists of a relatively inert metal, such as Pt or TiN, while the top electrode (TE) is generally a more reactive metal, such as Ti or Ta.19–21 In most cases, the switching layer is made of a metal oxide22 although also other materials have been used, such as nitrides,23 ternary oxides,24 chalcogenides,25 or 2D materials.26,27 Organic materials have been also explored, taking advantage of the low switching energies, wide-range of tunability, and facile ion-migration.28–30 However, limitations in the writing speed, scaling, and reliability remain open challenges. The forming operation generates a conductive filament (CF) across the switching layer. The CF resistance is changed by electrically induced chemical redox reactions, where the set operation causes the transition to the low-resistance state (LRS), while the reset operation causes the transition to the high-resistance state (HRS). These transitions can occur either by operating the device under the same polarity in unipolar RRAM31 or by alternating polarities in bipolar RRAM.32 Uniform switching RRAM where the resistance can change without any forming operation has also been proposed.33 

Figure 3(b) schematically shows the phase change memory (PCM), which is based on the ability of specific phase change materials to switch reversibly between the amorphous and the crystalline phases exhibiting different electrical resistivity.34–36 The phase change material typically consists of chalcogenides, such as Ge2Sb2Te537 where phase transition can be triggered by the applied voltage pulse via Joule heating. The PCM offers the ability to store intermediate states by modulating the crystalline fraction within the active material38 although the stability of the memory state is potentially affected by temperature-dependent retention, caused by the recrystallization of the amorphous region,39 and drift, caused by the structural relaxation of the amorphous structure.40 These issues can be handled by materials engineering to improve the high-temperature stability41 and device engineering to reduce the resistance drift.42 The PCM technology has also been demonstrated in relatively advanced technology nodes, such as 2843 and 18 nm.44 The very high maturity level of development and the higher endurance compared to other non-volatile memory devices45 make PCM an ideal candidate for in-memory computing.

Figure 3(c) schematically shows a ferroelectric random access memory (FeRAM) device based on the ability of a ferroelectric layer to display a remnant electric polarization after the application of voltage pulses.46 The most typical ferroelectric materials include perovskites with structure ABO3, where A and B are cations, e.g., BaTiO3 (BTO)47 and PbZrxTi1−xO3 (PZT).48 Most recently, FeRAM has seen a revival since ferroelectricity was reported in pure and doped hafnium oxides HfO2 with an orthorhombic structure.49 While being a CMOS-compatible oxide, HfO2 has a lower dielectric constant compared to perovskite materials, thus enabling the development of ferroelectric layers with a small thickness between 5 and 30 nm, which is suitable for memory device scaling and 3D integration.50,51 However, a topic of intense research remains the realization of ferroelectric layer thickness well below 10 nm with good uniformity.52 FeRAM is probed by measuring the displacement current during ferroelectric switching and thus is a destructive operation that is not always practical for in-memory computing applications. To solve this issue, the ferroelectric tunnel junction (FTJ) has been developed in which the ferroelectric polarization is reflected by the device resistance thanks to bilayer stack device engineering.53 

Figure 3(d) schematically shows the spin-transfer torque magnetic random access memory (STT-MRAM), consisting of a magnetic tunnel junction (MTJ) composed of a thin insulator sandwiched between two ferromagnetic (FM) layers. In one of the two FM layers, the ferromagnetic polarization is pinned by the presence of adjacent magnetic layers, such as a synthetic antiferromagnetic stack,54,55 thus acting as a reference for the polarization. The other layer is free and can change its polarization via electrical pulses. The free layer magnetization can thus be programmed by applying a current pulse directly across the MTJ via spin torque.56,57 Two STT-MRAM states can thus be obtained, namely, a parallel state with relatively low resistance and an antiparallel state with relatively high resistance for equal and opposite directions, respectively, of the magnetic polarization in the pinned and free layers. STT-MRAM features fast switching and good cycling endurance.58 On the other hand, the resistance window is generally quite limited (less than a factor 2) and multilevel operation is hard to achieve.59 

In addition to two-terminal FeRAM and FTJ, a three-terminal ferroelectric device has been proposed, namely, the ferroelectric field-effect transistor (FeFET) in Fig. 3(e). The FeFET consists of a field-effect transistor where the gate dielectric is a ferroelectric layer.60,61 The ferroelectric polarization thus affects the threshold voltage VT, which can be used as a monitor of the memory state, similar to a floating-gate memory. Contrary to FeRAM devices, the reading operation of the FeFET device is non-destructive, which is highly favorable for IMC. In addition, FeFET can be integrated in vertical 3D architectures62 and can display multilevel operation by multilayered stack engineering.63 An important challenge is the limited cycling endurance of FeFET, which is typically in the range of 105 cycles, too small for most of applications.

Figure 3(f) schematically shows the spin–orbit torque magnetic random access memory (SOT-MRAM). Similar to the STT-MRAM device, SOT-MRAM consists of an MTJ structure deposited on top of a metallic line made of a heavy metal, such as Pt or Ta.64,65 To program the SOT-MRAM device, a current pulse is applied along the heavy metal line, causing a polarity-dependent accumulation of spin-polarized electrons, thus inducing the magnetization switching in the free layer.65 The read operation is conducted by probing the MTJ resistance, similar to STT-MRAM. The separation between programming and reading paths allows minimizing the MTJ degradation, thus improving the cycling endurance with respect to STT-MRAM devices. Recently, the integration of SOT-MRAM with the CMOS technology has been demonstrated.66 Similar to MTJ devices, STT-MRAM suffers from a relatively small resistance window and difficult multilevel operation. Another potential issue is the need for an external magnetic field to support the free-layer switching, which can be overcome by advanced structures with built-in magnetic fields.67 

Figure 3(g) schematically shows the electro-chemical random access memory (ECRAM), where the conductivity of a metal-oxide transistor channel can be changed by ionized defects injection across the vertical stack, consisting of a reservoir layer and a solid-state electrolyte layer.68–70 Defects might consist of oxygen vacancies,71 Li ions,72 or protons.73 Organic materials have also been explored,74,75 demonstrating various synaptic and neuronal functionalities. Similar to SOT-MRAM, the three-terminal ECRAM structure allows decoupling the read and write paths, thus improving cycling endurance and reducing energy consumption thanks to the extremely low conductivity of the metal oxide channel, e.g., WO3.69 Controllable and linear potentiation characteristics were reported, which makes ECRAM a promising technology for synaptic devices in neuromorphic devices capable of learning and training.70 3D vertical ECRAM has also been demonstrated,76 paving the way for ECRAM-based high-density cross-point arrays.

Memtransistor devices combine the three-terminal transistor structure with the memristor-like ability to change the channel conductance by the application of an in-plane drain–source voltage.77–79 Typical memtransistors consist of a FET with a 2D semiconductor channel, such as MoS2. The memory behavior is obtained by applying large source–drain voltages, which can induce the resistance change by various physical mechanisms, such as field-induced dislocation migration in the polycrystalline MoS2 channel,77,78 the dynamic tuning of the Schottky barrier at the metal–semiconductor contact,80 or the direct cation migration from the electrodes on the surface of a 2D semiconductor.79,81 Other implementations of memtransistors exploit the optical properties of the 2D material (typically, a transition metal dichalcogenide) to develop devices with neural properties.82,83 Similar neuromorphic devices were obtained exploiting the ionic diffusion on amorphous oxides, such as ZnO or indium tungsten oxide (IWO).84–86 The major advantage of the memtransistor is the three-terminal structure, the atomically thin channel, and the 3D integration in the back end. However, compared to all the other reported technologies, memtransistors are still in their early stage of development, with significant challenges on materials, device structures, and reliability.

IMC development has achieved significant progress in the last 10 years, ranging from novel theoretical approaches to experimental IMC hardware demonstrations in silicon-verified test vehicles. The range of applications where IMC can offer improved energy efficiency, performance, and scaling opportunities can be divided into the two macro-categories of static and dynamic IMC, as shown in Fig. 4(a).

FIG. 4.

IMC macro-categories and corresponding applications. (a) Schematic current–voltage (I–V) curve of an emerging memory device, highlighting the low-voltage and high-voltage/switching regimes, corresponding to static and dynamic IMC, respectively. (b) Examples of static IMC, where the memory stores pre-trained data and executes the computation, e.g., MVM. (c) Examples of dynamic IMC, in which the switching regime allows reproducing dynamic features, such as adaptation and learning.

FIG. 4.

IMC macro-categories and corresponding applications. (a) Schematic current–voltage (I–V) curve of an emerging memory device, highlighting the low-voltage and high-voltage/switching regimes, corresponding to static and dynamic IMC, respectively. (b) Examples of static IMC, where the memory stores pre-trained data and executes the computation, e.g., MVM. (c) Examples of dynamic IMC, in which the switching regime allows reproducing dynamic features, such as adaptation and learning.

Close modal

Static IMC, schematically shown in Fig. 4(b), consists of a physical computing concept where the emerging memories are used to store data and perform computation without changing or updating their programmed state.6 Generally, memory devices in static IMC are first programmed to a desired state to encode pre-trained computing parameters in the form of conductance levels. Random states can also be used in some applications, such as the physical unclonable function (PUF)87 and reservoir computing (RC) where the stochastic conductance resulting from the fabrication process is directly used in the computation.88 The programmed memory arrays are then used as physical matrices to execute in situ vectorial operations with high parallelism, such as matrix-vector multiplication (MVM).89 Low voltages are applied to prevent any perturbation to the conductive states during computation,90 thus resulting in a low power consumption, which is attractive for decentralized computing architectures, such as edge91 and fog92 computing. The high degree of parallelism allows reducing the number of operations needed to carry out a given task, thus achieving O(1) computational complexity.93,94 Examples of static IMC include matrix-vector-multiplication (MVM, Sec. IV), inverse-matrix-vector multiplication (IMVM, Sec. V), and content-addressable memories (CAMs, Sec. VI).

Dynamic IMC, schematically shown in Fig. 4(c), generally combines all the opportunities of static IMC with the additional strength of enabling controlled switching of the memory devices to reproduce additional functions, such as neuron activation,95 stateful Boolean logic,96,97 and learning in supervised/unsupervised neural networks.98–101 A wide range of physical mechanisms can be used for the controlled switching, such as filament plasticity in RRAM devices,102 gradual crystallization in PCM devices,95 charge trapping in MoS2 memtransistors,103 and magnetic polarization for true-random number generation (TRNG).104 Dynamic IMC provides a promising avenue for reducing latency, energy, and circuit area by leveraging the intrinsic device physics of the device instead of emulating the desired characteristics via the analog/digital design of CMOS-based networks.105 Dynamic and static IMC are generally combined in the same platform to provide energy-efficient computing systems capable of learning and adaptation.95,106 Applications of dynamic IMC include outer product accelerators for neural network training (Sec. VII) and neuromorphic systems for brain-inspired computing (Sec. VIII).

MVM can be executed in a crosspoint memory array by universal circuit laws, such as Kirchhoff’s current law for summation and Ohm’s law for multiplication.7,107 The crosspoint array consists of a matrix of programmable memory elements whose top and bottom electrodes are, respectively, tied to common columns and rows, as shown in Fig. 5(a).108,109 According to the IMC concept, the crosspoint array acts as a physical matrix mapping computational parameters, e.g., synaptic weights in a neural network, to compute the MVM physically in the analog domain. This is schematically shown in Fig. 5(a), where the application of a voltage Vj at the jth column results in a current at the ith row, connected to ground, given by
(1)
where Gi,j is the conductance of the memory element at position i, j and N is the number of rows and columns.7,107 Equation (1) can be written in the compact matrix form i = Gv, thus evidencing the multiplication of the conductance matrix G with the voltage vector v.
FIG. 5.

Various cell structures for crosspoint array circuits. (a) One-resistor (1R) structure where the cell consists of a passive resistive device. (b) One-selector/one-resistor (1S1R) structure where the sneak path problem is circumvented by a non-linear selector device without affecting the integration density. (c) One-transistor/one-resistor (1T1R) structure allows for the selection of individual cells during programming and reading at the cost of a lower integration density. (d) One-capacitor (1C) structure, which prevents static leakage during MVM.

FIG. 5.

Various cell structures for crosspoint array circuits. (a) One-resistor (1R) structure where the cell consists of a passive resistive device. (b) One-selector/one-resistor (1S1R) structure where the sneak path problem is circumvented by a non-linear selector device without affecting the integration density. (c) One-transistor/one-resistor (1T1R) structure allows for the selection of individual cells during programming and reading at the cost of a lower integration density. (d) One-capacitor (1C) structure, which prevents static leakage during MVM.

Close modal

The MVM operation of Fig. 5(a) is carried out without moving the matrix parameters, in line with in situ processing paradigm of IMC. In addition, the operation is performed in just one step, thus minimizing the latency and maximizing the throughput thanks to a computational complexity of O(1). Such a massive parallelism of MVM allows for achieving outstanding area and energy efficiency, compared to traditional digital multiply-and-accumulate (MAC) operations. Finally, the crosspoint array is generally integrated in the back end of the line (BEOL) of the CMOS process, thus taking advantage of 3D stacking and of a small cell area of only 4F2/N, where F is the lithographic feature size and N is the number stacked layers.110 Despite the advantages of parallelism, density, and latency, the MVM concept is an analog computing process that is critically sensitive to device variability,111,112 noise,113 drift of conductance,40 and parasitic IR drop along wires,114 all affecting the accuracy of computation. To deal with these parasitic effects, several mitigation and compensation techniques have been proposed at device,115 algorithm,114,116–120 and architectural levels.121,122

The MVM concept can be extended to virtually all types of memory devices and cell structures in the array. The one-resistance (1R) structure of Fig. 5(a) is affected by crosstalk and sneak path issues during programming and reading.123 These issues can be prevented by adding a selector device in series to the memory element, resulting in the one-selector/one-resistor (1S1R) structure124–126 or the one-transistor/one-resistor (1T1R) structure,127–129 illustrated in Figs. 5(b) and 5(c), respectively. The 1S1R configuration avoids sneak path currents during the programming phase by introducing a highly non-linear two-terminal device109,130,131 that suppresses the current of unselected and half-selected cells in the array while maintaining the small 4F2 area of the 1R cell structure.109 The 1T1R structure ensures tight control of the programming current while allowing sophisticated program/verify algorithms132 at the cost of a larger cell area and a higher complexity introduced by the third terminal. In addition to resistive memory cells, where the computation parameter is stored in the conductance, capacitive memories can be adopted with the one capacitance (1C) structure in Fig. 5(d). Here, the small-signal capacitance can be tuned133 and used in MVM operations via the charge-voltage capacitor law Q = CV.

From the computational viewpoint, MVM requires the input vector to be encoded in voltage amplitudes, usually by means of a digital-to-analog converter (DAC). The output analog current can be sensed by using a transimpedance amplifier (TIA)134,135 and then converted by using an analog-to-digital converter (ADC) for further processing in the digital domain. Alternatively, the input vector can be encoded as the time duration tj by pulse-width modulation (PWM).136 This approach is typically implemented in the 1T1R array, where the time-encoded signal can be applied to the transistors gates, while a constant read voltage Vread is applied across the cells. PWM requires that the analog current at each row is integrated to yield the charge Qi according to
(2)
thus providing an alternate MVM operation yielding vector q = VreadGt similar to Eq. (1).

Note that, while MVM is strongly accelerated thanks to the array parallelism, memory programming might require a relatively long time, especially when a high equivalent-bit precision is needed. However, the programming time can be generally amortized for applications where the computational parameters remain fixed for most of the MVM operations. This is the case for discrete cosine transform (DCT) for extracting frequency components from a data sequence.137 DCT is routinely applied for image compression, thus providing an ideal application for IMC.134 

Another application where computational parameters remain constant throughout computation is the forward propagation during the inference phase in a deep neural network (DNN).138,139 Figure 6(a) shows a sketch of a fully connected neural network (FCNN) for image classification with three synaptic layers. Each synaptic layer can be viewed as a MVM where synaptic weights are mapped in the conductance matrix, while activations are used as the input vector. The inference operation can thus be mapped in several MVMs occurring in distinct crosspoint arrays, each mapping a different synaptic layer or a region of the DNN. Figure 6(b) shows a possible multi-core IMC architecture where each computational unit performs the assigned computation independently, as illustrated in Fig. 6(c), while a logic unit collects output data from the cores and submits activation signals to them. Given the sequential operation of DNN inference, the architecture and computational cores can be optimized to maximize the data throughput.

FIG. 6.

MVM for neural network accelerators. (a) Sketch of a fully connected DNN for image classification. (b) Multi-core architecture where each tile performs MVM between activation and synaptic weights. (c) Individual core consisting of a crosspoint array with peripheral circuits for input/output communication and conversion. (d) Correlation plot of energy efficiency as a function of throughput for different hardware accelerators of DNN inference, including eNVM-based, SRAM-based IMC, and a fully digital approach. Reproduced with permission from Seo et al., IEEE Solid-State Circuits Mag. 14(3), 65–79 (2022). Copyright 2022 IEEE.

FIG. 6.

MVM for neural network accelerators. (a) Sketch of a fully connected DNN for image classification. (b) Multi-core architecture where each tile performs MVM between activation and synaptic weights. (c) Individual core consisting of a crosspoint array with peripheral circuits for input/output communication and conversion. (d) Correlation plot of energy efficiency as a function of throughput for different hardware accelerators of DNN inference, including eNVM-based, SRAM-based IMC, and a fully digital approach. Reproduced with permission from Seo et al., IEEE Solid-State Circuits Mag. 14(3), 65–79 (2022). Copyright 2022 IEEE.

Close modal

Inference accelerators have been proposed with a variety of implementations, differing by the adopted memory technologies;98,127,140 the number of quantized levels of input, weight, and output;141,142 the peripheral circuits;136,143 the amount of possible reconfiguration;143 and the possibility of implementing backpropagation training in addition to forward-propagation inference.99,144 Similar to FCNN layers, IMC has been shown to accelerate convolutional layers99,127 and recurrent neural networks145 by changing the MVM partition and computation technique.146 

IMC can largely improve the energy efficiency and the throughput of MVM for DNN inference. Figure 6(d) shows the power efficiency and throughput of the state-of-the-art IMC accelerators based on nonvolatile memories compared to IMC based on static random access memory (SRAM) or fully digital accelerators.147 SRAMs feature faster access time and better robustness to variability and disturbs thanks to their digital nature and fully silicon-based CMOS technology. However, SRAM has a larger cell area due to the 6T or 8T bit-cell structure, cannot implement multilevel operations, and cannot provide nonvolatile storage, thus requiring the upload of computational parameters at the power-on phase. The latter issue is a significant drawback in applications where the neural accelerator frequently switches between stand-by and computing phase, which is typical in low-power edge-computing applications.

MVM represents the core operation of combinatorial optimization tasks.148 Here, emerging memories can provide both the MVM operation via the crosspoint array and the stochastic physical noise, which is generally needed to navigate among the local minima of the cost function. Indeed, metaheuristic optimization techniques, such as chaotic simulated annealing or stochastic simulated annealing, require massive MVM and tunable sources of noise. These computing strategies typically rely on recurrent stochastic networks, such as the Hopfield neural network, sketched in Fig. 7(a),95,106,149 or restricted Boltzmann machine (RBM).150–152 In these approaches, the network is characterized by a certain energy (or cost) function E that depends on the state of the neurons, which in turn depends on the synaptic spike stimulations and the injected noise. By properly tuning the injected noise, it is possible to control the ability of the neurons to escape from local minima of E, as depicted in Fig. 7(b). By gradually decreasing the injected noise, the search takes the shape of a simulated annealing algorithm, where the effective temperature is slowly decreased in analogy with the cooling phase of physical annealing. This is shown in Fig. 7(c), where the network manages to find thermal equilibrium at the global minimum of E, thus solving the optimization task.145 This approach finds application in several key workloads in logistics, scheduling, and other NP-hard problems, such as the traveling salesperson problem.

FIG. 7.

MVM for combinatorial optimization. (a) Sketch of a Hopfield-type recurrent neural network, characterized by a system energy E. (b) System energy E and iterative search of the global minimum, representing the optimal solution of the combinatorial task. The decreasing noise allows for reaching the global minimum by escaping local minima. (c) Evolution of the average energy of a RRAM-based Hopfield RNN for various optimization strategies. Reprinted with permission from Mahmoodi et al., 2019 International Electron Devices Meeting (IEDM) (IEEE, 2019), pp. 14.7.1–14.7.4. Copyright 2019 IEEE.

FIG. 7.

MVM for combinatorial optimization. (a) Sketch of a Hopfield-type recurrent neural network, characterized by a system energy E. (b) System energy E and iterative search of the global minimum, representing the optimal solution of the combinatorial task. The decreasing noise allows for reaching the global minimum by escaping local minima. (c) Evolution of the average energy of a RRAM-based Hopfield RNN for various optimization strategies. Reprinted with permission from Mahmoodi et al., 2019 International Electron Devices Meeting (IEDM) (IEEE, 2019), pp. 14.7.1–14.7.4. Copyright 2019 IEEE.

Close modal

Programming variability is a major issue in deterministic DNNs by affecting the weight precision, hence the accuracy of inference. On the other hand, programming variation can provide a source of stochasticity for specific computing applications, such as stochastic computing and hardware security. For instance, Bayesian inference relies on neural networks where the model parameters are probability distributions. In this scenario, transferring the ex-situ trained model to the hardware network is less critical since a probability distribution can be naturally modeled by the physical distribution of conductance states.153, Figure 8(a) shows the conceptual scheme of an RRAM-based Bayesian network where each synaptic weight belongs to a certain distribution. Figure 8(b) shows a possible implementation in an N x M array of RRAM synapses with 1T1R structures.153 Here, the distribution of a synaptic parameter is modeled by the distribution of conductance states of N devices in a column, while the input voltages to each column are the outputs generated by M neurons in the previous layer. By applying a voltage vector across M columns, each row yields a current that flows into a neuron circuit, resulting in a distribution of N neuron activation voltages, namely, the output distribution of the neuron. Based on the same approach, Monte Carlo Markov chain (MCMC) networks have been demonstrated with stochastic RRAM arrays.154 

FIG. 8.

MVM for stochastic computing. (a) Sketch of a Bayesian neural network where synapses and neurons are represented by probability distributions. Reproduced with permission from Dalgaty et al., Adv. Intell. Syst. 3(8), 2000103 (2021). Copyright 2021 Author(s), licensed under a Creative Commons Attribution 4.0 License. (b) RRAM-based realization of the Bayesian neural network, where each column describes the distribution of a synaptic parameter. Reproduced with permission from Dalgaty et al., Adv. Intell. Syst. 3(8), 2000103 (2021). Copyright 2021 Author(s), licensed under a Creative Commons Attribution 4.0 License. (c) NVM-based PUF circuit based on a passive crosspoint array of stochastic memory devices for the generation of a response as the input of a submitted challenge. Reproduced with permission from M. R. Mahmoodi, D. B. Strukov, and O. Kavehei, IEEE Trans. Electron Devices 66(12), 5050–5059 (2019). Copyright 2019 IEEE.

FIG. 8.

MVM for stochastic computing. (a) Sketch of a Bayesian neural network where synapses and neurons are represented by probability distributions. Reproduced with permission from Dalgaty et al., Adv. Intell. Syst. 3(8), 2000103 (2021). Copyright 2021 Author(s), licensed under a Creative Commons Attribution 4.0 License. (b) RRAM-based realization of the Bayesian neural network, where each column describes the distribution of a synaptic parameter. Reproduced with permission from Dalgaty et al., Adv. Intell. Syst. 3(8), 2000103 (2021). Copyright 2021 Author(s), licensed under a Creative Commons Attribution 4.0 License. (c) NVM-based PUF circuit based on a passive crosspoint array of stochastic memory devices for the generation of a response as the input of a submitted challenge. Reproduced with permission from M. R. Mahmoodi, D. B. Strukov, and O. Kavehei, IEEE Trans. Electron Devices 66(12), 5050–5059 (2019). Copyright 2019 IEEE.

Close modal

The stochastic properties of emerging memories can also provide the foundation for developing novel security primitive circuits.104, Figure 8(c) shows the conceptual idea for implementing a memory-based physical unclonable function (PUF) for chip authentication.87 An input challenge encodes the information to select specific rows and columns of the crosspoint memory array, thus generating a single-bit unique response by current comparison. A 1R crosspoint array is adopted to take advantage of circulating sneak path currents, enabling the participation and interaction of all memory devices in the array, thus increasing the complexity of the solution and robustness to external attacks.87 

Crosspoint memory arrays with closed-loop circuit topology can accelerate inverse-matrix vector multiplication (IMVM), such as linear system solution, matrix inversion, and linear/regularized regression.155,157,158 Figure 9(a) shows a typical IMVM circuit for the solution of a linear system, where the array is complemented with an array of operational amplifiers (OAs). In this circuit, currents are provided as row input, while the voltages that satisfy Eq. (1) are automatically established by the OAs via the closed-loop feedback connection, thus allowing for the solution for the set of linear equations by
(3)
Similar to open-loop MVM of Sec. IV, closed-loop IMC (CL-IMC) can achieve the O(1) solution of algebra problems with polynomial complexity O(nα), where n is the number of linear equations and α is between 2 and 3.156 CL-IMC appears thus as one of the most promising candidates for accelerating complex linear algebra tasks via IMC.
FIG. 9.

Closed-loop IMC circuits for IMVM. (a) Circuit for the solution of linear systems155 of the form Ax = b. (b) Circuit for the eigenvector computation, i.e., for the solution of the secular equation156  Ax = λx. (c) Pseudoinverse matrix computing circuit for the solution of the linear regression problems157,158 of the form Xβ + ɛ = y.

FIG. 9.

Closed-loop IMC circuits for IMVM. (a) Circuit for the solution of linear systems155 of the form Ax = b. (b) Circuit for the eigenvector computation, i.e., for the solution of the secular equation156  Ax = λx. (c) Pseudoinverse matrix computing circuit for the solution of the linear regression problems157,158 of the form Xβ + ɛ = y.

Close modal

Figure 10(a) shows the experimental output of a hardware implementation of the circuit in Fig. 9(a) to yield the elements of a 3 × 3 inverse matrix A−1 as a function of the analytical solution.155 In-memory matrix inversion might find application in a number of machine learning tasks, such as Markov chain159 and numerical solution of differential equations.155 With errors as low as 3%, feedback-based crossbar circuits can provide a viable alternative to bulky digital processors for linear system solution tasks, serving as a potential cornerstone of IMC-based analog processing units.

FIG. 10.

Results of closed-loop IMC for IMVM problems. (a) Correlation plot of the experimental results of the inversion of a 3 × 3 matrix as a function of ideal analytical results. Reproduced with permission from Sun et al., Proc. Natl. Acad. Sci. U. S. A. 116(10), 4123–4128 (2019). Copyright 2019 National Academy of Sciences. (b) Correlation plot of the circuit output for a PageRank algorithm of the Harvard 500 dataset as a function of the ideal analytical results. Reproduced with permission from Sun et al., IEEE Trans. Electron Devices 67(4), 1466–1470 (2020). Copyright 2020 IEEE. (c) Experimental demonstration of linear regression on RRAM devices. Reproduced with permission from Sun et al., Sci. Adv. 6(5), eaay2378 (2020). Copyright 2020 Author(s), licensed under a Creative Commons Attribution 4.0 License.

FIG. 10.

Results of closed-loop IMC for IMVM problems. (a) Correlation plot of the experimental results of the inversion of a 3 × 3 matrix as a function of ideal analytical results. Reproduced with permission from Sun et al., Proc. Natl. Acad. Sci. U. S. A. 116(10), 4123–4128 (2019). Copyright 2019 National Academy of Sciences. (b) Correlation plot of the circuit output for a PageRank algorithm of the Harvard 500 dataset as a function of the ideal analytical results. Reproduced with permission from Sun et al., IEEE Trans. Electron Devices 67(4), 1466–1470 (2020). Copyright 2020 IEEE. (c) Experimental demonstration of linear regression on RRAM devices. Reproduced with permission from Sun et al., Sci. Adv. 6(5), eaay2378 (2020). Copyright 2020 Author(s), licensed under a Creative Commons Attribution 4.0 License.

Close modal
The CL-IMC prototype topology of Fig. 9(a) can be extended to eigenvector computation by the circuit of Fig. 9(b).155 Here, the output is directly fed as input after sign inversion, thus resulting in a self-sustaining architecture. OAs are used in the transimpedance amplifier (TIA) configuration, where the feedback conductance Gλ encodes the principal matrix eigenvalue λ. Kirchhoff’s law at the virtual ground nodes thus reads
(4)
which electrically matches the secular equation Av = λv. For negative λ, the analog inversion buffers are removed and the absolute value |λ| is encoded as the conductance Gλ. Differently from the linear system solver in Fig. 9(a), the eigenvector circuit operates in a positive feedback regime, thus allowing for self-sustaining operation. Due to the positive feedback, only the eigenvectors of the largest positive and negative eigenvalues can be solved. In addition, Gλ should slightly deviate from the ideal λ to initiate the self-sustained dynamic response.160, Figure 10(b) shows the results of a website ranking task according to Google’s PageRank algorithm, which is a typical application of eigenvector computation,156 together with similar ranking algorithms.161 It has been estimated that the solution of PageRank with CL-IMC can provide up to 100× throughput improvement with respect to a digital computer.156 
The CL-IMC concept can be further extended to non-square matrices as in the computation of the Moore–Penrose inverse or pseudoinverse.156, Figure 9(c) shows the CL-IMC circuit for matrix pseudoinverse computation or linear regression. The circuit features two m × n crosspoint memory arrays, each encoding a given matrix dataset, and two OA arrays. A simple analysis shows that, by injecting the input current at the virtual grounds of the first m OAs, the output voltages at the second array of OAs are given by
(5)
Figure 10(c) shows experimental results for a two-dimensional linear regression problem on a relatively small scale.156 Note that this circuit also eliminates the stability constraints of the linear system solver in Fig. 9(a),90,158 which is limited to positive-definite matrices only as a requirement for ensuring poles to lie in the left-half-plane. Furthermore, by using a matrix F instead of simple local-feedback conductance for the first m OAs, the same circuit can execute a generalized regression according to
(6)
where F is a generalization matrix for the given dataset.158 Among the applications of the Moore–Penrose inverse are linear/logistic regression and prediction, which play an important role in data analytics and machine learning.157 

CL-IMC allows for the acceleration of several IMVM operations with reduced complexity, which is attractive for large-scale general-purpose machine learning accelerators. On the other hand, CL-IMC also faces considerable challenges, such as the reduced precision with respect to a floating-point computers, owing to the increased sensitivity of the analog domain.159 Circuit non-ideality affecting the computing accuracy includes the parasitic interconnect resistances,114 electronic noise from circuit components,90 and conductance variations.158 The effect of non-ideality can be mitigated by compensation schemes, array tiling, signal range increase, and fine-tuned programming algorithms, thus resulting in a complex trade-off with the overall throughput, area, and energy consumption.90,122,162 On the other hand, error-tolerant applications, such as massive multiple-input/multiple-output (MIMO) decoding in 6G networks, allow for better robustness to circuit non-ideality.163 Finally, the medium-precision solution obtained by analog IMC might be used as a seed for high-precision digital solvers,164 allowing for orders-of-magnitude improvements in energy consumption and execution time.

The content-addressable memory (CAM) is a specialized memory structure where stored data are accessed by inputting the desired data content and extracting their address as the output, which is the opposite compared to conventional memories.165, Figure 11(a) shows a schematic structure of a typical ternary content addressable memory (TCAM), where the third option don’t care or “X” is available in addition to binary 0 and 1 values in the memory array. Here, an input pattern presented to the CAM from data lines (DLs) is compared with the stored data and the corresponding match line (ML) is asserted if a match is found. Due to its inherently high parallelism, CAM/TCAM is naturally suited to accelerate pattern matching,166,167 branch prediction,168 and lookup operations169  in situ within the memory, thus minimizing data movement.

FIG. 11.

Content-addressable memory based on emerging memories. (a) Schematic of a digital TCAM, where binary data are matched against patterns stored in a ternary array. Reproduced with permission from Pedretti et al., Nat. Commun. 12(1), 5806 (2021). Copyright 2015 Author(s), licensed under a Creative Commons Attribution 4.0 License. (b) RRAM-based TCAM cell, where memory devices M1, M2 store the ternary value as a suitable combination of HRS and LRS states. (c) Memristor-based analog CAM cell, where the analog input pattern is encoded as the voltage amplitude on the Data Line (DL). Reproduced with permission from Li et al., Nat. Commun. 11(1), 1638 (2020). Copyright 2015 Author(s), licensed under a Creative Commons Attribution 4.0 License. (d) Decision tree for the Iris dataset classification. Reproduced with permission from Pedretti et al., Nat. Commun. 12(1), 5806 (2021). Copyright 2015 Author(s), licensed under a Creative Commons Attribution 4.0 License. (e) Analog-CAM implementation of the decision tree in (d), where each root-to-leaf path corresponds to a row of the memory array. Reproduced with permission from Pedretti et al., Nat. Commun. 12(1), 5806 (2021). Copyright 2015 Author(s), licensed under a Creative Commons Attribution 4.0 License.

FIG. 11.

Content-addressable memory based on emerging memories. (a) Schematic of a digital TCAM, where binary data are matched against patterns stored in a ternary array. Reproduced with permission from Pedretti et al., Nat. Commun. 12(1), 5806 (2021). Copyright 2015 Author(s), licensed under a Creative Commons Attribution 4.0 License. (b) RRAM-based TCAM cell, where memory devices M1, M2 store the ternary value as a suitable combination of HRS and LRS states. (c) Memristor-based analog CAM cell, where the analog input pattern is encoded as the voltage amplitude on the Data Line (DL). Reproduced with permission from Li et al., Nat. Commun. 11(1), 1638 (2020). Copyright 2015 Author(s), licensed under a Creative Commons Attribution 4.0 License. (d) Decision tree for the Iris dataset classification. Reproduced with permission from Pedretti et al., Nat. Commun. 12(1), 5806 (2021). Copyright 2015 Author(s), licensed under a Creative Commons Attribution 4.0 License. (e) Analog-CAM implementation of the decision tree in (d), where each root-to-leaf path corresponds to a row of the memory array. Reproduced with permission from Pedretti et al., Nat. Commun. 12(1), 5806 (2021). Copyright 2015 Author(s), licensed under a Creative Commons Attribution 4.0 License.

Close modal

TCAM parallelism comes at the expense of relatively large area and power consumption as every memory cell must be equipped with a dedicated comparison circuit. When implemented using SRAM memories, a single CAM cell may use up to 16 transistors,165 thus adding significant area, latency, and power overhead for the search operation and preventing large-scale integration. By replacing conventional SRAM with emerging memories, leakage power can be reduced and cell density can be improved. Figure 11(b) shows a differential RRAM-based CAM cell, where memory devices M1 and M2 are programmed to either state LRS/HRS or HRS/LRS to reproduce values “1” or “0,” respectively.167 State “X” is instead obtained by programming both RRAM devices to either HRS or LRS. Depending on the relative ratio of the two conductances (stored data) and the voltage at the wordline (WL) (input data), the match-line ML is either asserted low or left high, thus realizing CAM operation. RRAM-based TCAMs were shown to accelerate regular expression matching and genomic sequencing with up to 25× improvement in energy efficiency.167 

The analog tunability of emerging memories allows for realizing analog CAMs capable of analog pattern matching with stored data. Figure 11(c) shows an analog CAM cell170 where value intervals, rather than binary values, can be stored and compared with analog input patterns. In this case, the match line is asserted when all values of the input pattern fall within the ranges stored in the corresponding row of the memory array. Analog memory-based CAMs are naturally suited to accelerate more-than-binary tree-based algorithms, which represent the foundation of many machine learning tasks. Figure 11(d) shows a proposed implementation171 of tree-based inference applied to the classification of the Iris dataset. By mapping each root-to-leaf path into a corresponding row of the memory array, input data can be instantly classified by coupling the analog CAM to a label array, as shown in Fig. 11(e), with a ×103 throughput improvement with respect to digital implementations.171 

While MVM can efficiently accelerate forward propagation for DNN inference, it only partially supports the execution of the training process. In fact, DNN training is by far the most energy- and time-consuming operation in a DNN.172 The most typical training methodology relies on the gradient-descent algorithm, such as the backpropagation approach, which requires a multiple synaptic weight update and data transferring.138 DNN training requires several days/weeks of iterations in multicore supercomputers, such as the graphical processing unit (GPU) or the tensor processing unit (TPU), to update billions of synaptic parameters in the network. This is mainly because all data and synaptic parameters must be transferred between the memory and the processing unit, which results in a major memory bottleneck. Figure 12(a) shows a DNN with the typical training approach, including (i) forward propagation of data for generating an output neuron, (ii) calculation of the error δj between the jth neuron current output and the ideal output also known as the label, and (iii) backpropagation of the error for the weight update according to
(7)
where Δwij is the weight update, η is the learning rate, and xi is the input of the pre-synaptic neuron. The operation in Eq. (7) is an outer product, where the input vectors x and δ generate a matrix of weight update ΔW to be applied to the whole synaptic layer. The vector–vector outer product ΔW = x ⊗ y can be accelerated within the crosspoint array, as shown in Fig. 12(b), where x is mapped as the pulse-width of the row voltage pulses, while y is mapped as the amplitude of the column voltage pulses.173 
FIG. 12.

IMC training by an outer product. (a) Schematic representation of an artificial neural network, where backpropagation training relies on the weight update according to an outer product of the error δj and the signal xi. (b) Crosspoint implementation of the outer product. The weight wij is updated by a value Δwij = xi · yj. The multiplicative effect is obtained by encoding xi as the pulse width of the row voltage pulse and yj as the amplitude of the column voltage pulse. From Agarwal et al., 2016 International Joint Conference on Neural Networks (IJCNN). Copyright 2016 IEEE. Reproduced with permission from IEEE.

FIG. 12.

IMC training by an outer product. (a) Schematic representation of an artificial neural network, where backpropagation training relies on the weight update according to an outer product of the error δj and the signal xi. (b) Crosspoint implementation of the outer product. The weight wij is updated by a value Δwij = xi · yj. The multiplicative effect is obtained by encoding xi as the pulse width of the row voltage pulse and yj as the amplitude of the column voltage pulse. From Agarwal et al., 2016 International Joint Conference on Neural Networks (IJCNN). Copyright 2016 IEEE. Reproduced with permission from IEEE.

Close modal

The key requirement for the in-memory outer product of Fig. 12(b) is the linearity of the conductance change with both pulse voltage and time or at least one of the two. Conductance update can be physically obtained by potentiation or depression of the memory conductance by applying suitable pulses to the devices. The linear update must be obtained by an open-loop operation, where the same conductance change is achieved at a given voltage and pulse-width, irrespective of the initial state. Unfortunately, potentiation and depression of emerging memories are generally non-linear with applied voltage as a result of the exponential time–voltage relationship of ion migration, tunneling, and other fundamental physical processes of set/reset.128 

To support the linearity of potentiation/depression with time, Fig. 13 shows measured conductance update characteristics for emerging memory devices. The RRAM device in Fig. 13(a) displays a non-linear increase with the number of pulses, or equivalently time, with an initially steep change followed by a saturation regime.15,174 Figure 13(b) shows the weight update characteristics for an ECRAM device, where an improved linearity can be seen thanks to the three-terminal structure separating the read and program paths.69  Figure 13(c) shows the potentiation characteristic for a MoS2 charge trap memory (CTM) under drain voltage pulses of equal amplitude.103,175 The conductance update characteristics can be described by the empirical formula as follows:
(8)
where Gmax and Gmin are the initial and final conductance values, p is the normalized number of pulses, and ν is a shape factor describing the linearity of the weight update.
FIG. 13.

Experimental weight-update characteristics by pulses of equal amplitude and pulse-width for potentiation and depression. (a) Update characteristics of TaOx/TiO2 RRAM. Reprinted with permission from Yu et al., 2015 IEEE International Electron Devices Meeting (IEDM) (IEEE, 2015), pp. 17.3.1–17.3.4. Copyright 2015 IEEE. (b) Update characteristics of Li-based ECRAM. Reprinted with permission from Tang et al., 2018 IEEE International Electron Devices Meeting (IEDM) (IEEE, 2018), pp. 13.1.1–13.1.4. Copyright 2018 IEEE. (c) Update characteristics of MoS2-based CTM. Reprinted with permission from Farronato et al., 2022 IEEE 4th International Conference on Artificial Intelligence Circuits and Systems (AICAS) (IEEE, 2022), pp. 1–4. Copyright 2022 IEEE. (d) Correlation plot of the non-linearity factor ν and normalized conductance window (GmaxGmin)/Gmin for various synaptic devices. Reprinted with permission from Farronato et al., 2022 IEEE 4th International Conference on Artificial Intelligence Circuits and Systems (AICAS) (IEEE, 2022), pp. 1–4. Copyright 2022 IEEE.

FIG. 13.

Experimental weight-update characteristics by pulses of equal amplitude and pulse-width for potentiation and depression. (a) Update characteristics of TaOx/TiO2 RRAM. Reprinted with permission from Yu et al., 2015 IEEE International Electron Devices Meeting (IEDM) (IEEE, 2015), pp. 17.3.1–17.3.4. Copyright 2015 IEEE. (b) Update characteristics of Li-based ECRAM. Reprinted with permission from Tang et al., 2018 IEEE International Electron Devices Meeting (IEDM) (IEEE, 2018), pp. 13.1.1–13.1.4. Copyright 2018 IEEE. (c) Update characteristics of MoS2-based CTM. Reprinted with permission from Farronato et al., 2022 IEEE 4th International Conference on Artificial Intelligence Circuits and Systems (AICAS) (IEEE, 2022), pp. 1–4. Copyright 2022 IEEE. (d) Correlation plot of the non-linearity factor ν and normalized conductance window (GmaxGmin)/Gmin for various synaptic devices. Reprinted with permission from Farronato et al., 2022 IEEE 4th International Conference on Artificial Intelligence Circuits and Systems (AICAS) (IEEE, 2022), pp. 1–4. Copyright 2022 IEEE.

Close modal

Figure 13(d) summarizes the metrics for synaptic memory devices, reporting the normalized conductance window (GmaxGmin)/Gmin, describing the full-scale range of the synaptic weight as a function of the shape factor ν, and describing linearity for various synaptic devices.15,69,175–179 Among all the memory technologies, the CTM device combines excellent linearity of the weight update curve with a large conductance window. Note that the CTM device has a unidirectional characteristic, i.e., depression is spontaneous and generally non-linear. However, this limitation is mitigated by a differential synapse scheme where two CTM devices are combined in the same synapse to map positive and negative weights.18 CTM also offers extremely low conductance thanks to the sub-threshold operation, which is useful to suppress the IR drop and enable the training of large synaptic arrays. MoS2 also displays excellent scaling properties thanks to the atomically thin 2D semiconductor and the capability of 3D integration, thus providing a promising avenue for high-density 3D crosspoint arrays for training accelerators.180 

Neuromorphic engineering aims at developing computing systems by using design principles that are based on those of the biological nervous systems.105,181 By mimicking the human brain, the objective is to achieve a high energy efficiency, large parallelism, and the capacity to solve cognitive tasks, such as object recognition, association, adaptation, and learning.18 Most importantly, the brain provides a blueprint for non-von Neumann computation, where information and memory are co-located in the same neurobiological network.182 The neuromorphic term and concept were originally introduced in the early 1990s181 and later revived in the early 2000s,183 when the fast growth of online generated data started to spur the investigation of alternative computing paradigms. Recently, the neuromorphic engineering topic has seen a new wave of research interest in view of the added potential to embrace emerging memories as an enabling technology to implement brain-inspired processes.184–186 

Figure 14 shows a summary of the main neurobiological features that can be implemented in a neuromorphic system, including synapses and neurons, the latter composed of a soma, an axon, and several dendrites.187,188 Information is exchanged among neurons in the form of temporal spikes, which are weighted by synaptic connections and collected by the neuron soma. Synapses display synaptic plasticity, where the synaptic weight is changed upon spiking stimulation. Both long-term plasticity189,190 and short-term plasticity191 have been evidenced by experiments. Over the years, several plasticity rules have been proposed, including paired-pulse facilitation (PPF),192,193 spike-timing dependent plasticity (STDP),191,194–196 triplet-based plasticity,197,198 and spike-rate dependent plasticity (SRDP).199,200 The hardware implementation of each element in Fig. 14 in CMOS technology generally requires complicated transistor-based circuits and large-area capacitors to match the dynamic temporal evolution of the brain processes. From this standpoint, emerging memories offer a technology platform for providing nonvolatile synaptic weights capable of short- and long-term plasticity, increasing the area density of synapses and featuring unique dynamic properties with neuro-plausible time constant by the physical device mechanism.187,188 For instance, synaptic long-term plasticity by STDP has been demonstrated in both PCM201,202 and RRAM.177,203–205 Learning was shown to occur both by properly overlapping the pre- and post-synaptic spikes across the memory element205,206 or by the physical interaction between thermal and electrical stimulations in the so-called second-order memristors.207  Figures 15(a) and 15(b) show the 1 T1R synapse circuit with the typical pulses applied to the gate and TE. This circuit demonstrated both the synaptic weight update according to STDP and the communication between the PRE- and POST-neurons. Figure 15(c) shows instead the programming pulses and pre/post-spikes for STDP in a Ta2O5−x/TaOy second-order memristor. By applying the pre- and post-spikes at the TE and BE, the interaction between the applied electric field and the local temperature leads to a Δt-dependent conductance change. Multisynaptic circuits with 1T1R RRAM devices capable of STDP were shown to display unsupervised learning,101,208 which is extremely promising for the development of the perceptron-like network capable of autonomous learning and adaptation [Figs. 15(d) and 15(e)].

FIG. 14.

Schematical illustration of the main neuro-biological processes involved in neuromorphic brain-inspired computing, including neuron summation, integration and fire, dendritic filtering, and synaptic long- and short-term plasticity. Reproduced with permission from Ielmini et al., APL Mater. 9(5), 050702 (2021). Copyright 2021 AIP Publishing LLC.

FIG. 14.

Schematical illustration of the main neuro-biological processes involved in neuromorphic brain-inspired computing, including neuron summation, integration and fire, dendritic filtering, and synaptic long- and short-term plasticity. Reproduced with permission from Ielmini et al., APL Mater. 9(5), 050702 (2021). Copyright 2021 AIP Publishing LLC.

Close modal
FIG. 15.

Long-term plasticity in memory-based artificial synapses. (a) Structure of an STDP synapse based on RRAM with the 1T1R structure. Reproduced with permission from Ambrogio et al., IEEE Trans. Electron Devices 63(4), 1508–1515 (2016).Copyright 2016 Author(s), licensed under a Creative Commons Attribution 4.0 License. (b) Typical overlapping gate and TE voltages applied to the synapse for the case of synaptic potentiation with Δt > 0. Reproduced with permission from Ambrogio et al., IEEE Trans. Electron Devices 63(4), 1508–1515 (2016).Copyright 2016 Author(s), licensed under a Creative Commons Attribution 4.0 License. (c) Conceptual scheme of the STDP via non-overlapping spikes in a second-order memristor based on Ta2O5−x/TaOy. Reproduced with permission from Kim et al., Nano Lett. 15(3), 2203–2211 (2015). Copyright 2015 American Chemical Society. (d) Schematic illustration of a perceptron-like neuromorphic network capable of unsupervised learning via STDP in memory-based synapses. Reproduced with permission from Pedretti et al., IEEE J. Emerging Sel. Top. Circuits Syst. 8(1), 77–85 (2017). Copyright 2017 Author(s), licensed under a Creative Commons Attribution 4.0 License. (e) Measured synaptic weights 1/R as a function of spike number (epoch) for the perceptron in (d), indicating potentiation of stimulated synapses and depression of non-stimulated synapses. Reproduced with permission from Pedretti et al., Sci. Rep. 7(1), 5288 (2017). Copyright 2015 Author(s), licensed under a Creative Commons Attribution 4.0 License.

FIG. 15.

Long-term plasticity in memory-based artificial synapses. (a) Structure of an STDP synapse based on RRAM with the 1T1R structure. Reproduced with permission from Ambrogio et al., IEEE Trans. Electron Devices 63(4), 1508–1515 (2016).Copyright 2016 Author(s), licensed under a Creative Commons Attribution 4.0 License. (b) Typical overlapping gate and TE voltages applied to the synapse for the case of synaptic potentiation with Δt > 0. Reproduced with permission from Ambrogio et al., IEEE Trans. Electron Devices 63(4), 1508–1515 (2016).Copyright 2016 Author(s), licensed under a Creative Commons Attribution 4.0 License. (c) Conceptual scheme of the STDP via non-overlapping spikes in a second-order memristor based on Ta2O5−x/TaOy. Reproduced with permission from Kim et al., Nano Lett. 15(3), 2203–2211 (2015). Copyright 2015 American Chemical Society. (d) Schematic illustration of a perceptron-like neuromorphic network capable of unsupervised learning via STDP in memory-based synapses. Reproduced with permission from Pedretti et al., IEEE J. Emerging Sel. Top. Circuits Syst. 8(1), 77–85 (2017). Copyright 2017 Author(s), licensed under a Creative Commons Attribution 4.0 License. (e) Measured synaptic weights 1/R as a function of spike number (epoch) for the perceptron in (d), indicating potentiation of stimulated synapses and depression of non-stimulated synapses. Reproduced with permission from Pedretti et al., Sci. Rep. 7(1), 5288 (2017). Copyright 2015 Author(s), licensed under a Creative Commons Attribution 4.0 License.

Close modal

Volatile memory devices, while lacking a clear application in digital systems due to insufficient retention, provide an ideal technology for reproducing short-term memory (STM) behavior in neuromorphic systems.193 Volatile switching can be displayed in a class of filamentary RRAM devices where Ag or Cu are used as TE materials20,209 or dispersed in the switching layer.210, Figure 16(a) shows the typical IV characteristics of a volatile RRAM device based on Ag nanodots.211 The volatile behavior is generally attributed to the filamentary switching and spontaneous rediffusion of Ag atoms to minimize the total energy of the filament.209 Volatile RRAMs were initially proposed as selector elements in crosspoint memory arrays thanks to their large on/off ratio and low leakage current.212–214 Later, these devices attracted interest from the neuromorphic community in view of their relatively long retention time similar to the biological time constants for STM.193,215 For instance, Fig. 16(b) shows a typical pulsed characteristic of an Ag-based RRAM, stimulated by a triangular pulse. After the pulse, the current persists for a retention time of about 150 µs, revealing the time decay of the filamentary path within the active material. Volatile switching of RRAM devices can be used as the fire function in an integrate-and-fire neuron circuit, thus avoiding the use of area-consuming amplifiers and pulse generators.216 Volatile RRAMs have also been used for replicating PPF induced by paired spikes, where the pulsed-induced potentiation of the synaptic weight is enhanced by the application of two identical stimuli.217,218 Most importantly, the dynamic STM effect can be useful to mimic sensing, learning, and processing of spatiotemporal patterns, such as audio and video sequences.

FIG. 16.

Short-term memory in artificial synapses based on volatile memories. (a) Measured I–V characteristics of an RRAM device based on Ag nanodots, indicating the set transition to the on-state at Vth and spontaneous decay to the off-state at Vhold. Reproduced with permission from Li et al., Adv. Sci. 7(22), 2002251 (2020). Copyright 2020 Author(s), licensed under a Creative Commons Attribution 4.0 License. (b) Pulsed characteristic of a volatile RRAM device, indicating the spontaneous decay to the off-state after spiking stimulation with a retention time of about 150 µs. Reproduced with permission from Covi et al., IEEE Trans. Electron Devices 68(9), 4335–4341 (2021). Copyright 2021 Author(s), licensed under a Creative Commons Attribution 4.0 License. (c) Schematic circuit for spatiotemporal recognition, where the EPSC is obtained as the comparison of excitatory and inhibitory synaptic currents. Reproduced with permission from Wang et al., Adv. Intell. Syst. 3(4), 2000224 (2020). Copyright 2020 Author(s), licensed under a Creative Commons Attribution 4.0 License. (d) Measured EPSC for the case of preferred sequence A–B in (c), resulting in a positive current. Reproduced with permission from Wang et al., Adv. Intell. Syst. 3(4), 2000224 (2020). Copyright 2020 Author(s), licensed under a Creative Commons Attribution 4.0 License. (e) Measured EPSC for the case of non-preferred sequence B–A in (c), resulting in a negative current. Reproduced with permission from Wang et al., Adv. Intell. Syst. 3(4), 2000224 (2020). Copyright 2020 Author(s), licensed under a Creative Commons Attribution 4.0 License.

FIG. 16.

Short-term memory in artificial synapses based on volatile memories. (a) Measured I–V characteristics of an RRAM device based on Ag nanodots, indicating the set transition to the on-state at Vth and spontaneous decay to the off-state at Vhold. Reproduced with permission from Li et al., Adv. Sci. 7(22), 2002251 (2020). Copyright 2020 Author(s), licensed under a Creative Commons Attribution 4.0 License. (b) Pulsed characteristic of a volatile RRAM device, indicating the spontaneous decay to the off-state after spiking stimulation with a retention time of about 150 µs. Reproduced with permission from Covi et al., IEEE Trans. Electron Devices 68(9), 4335–4341 (2021). Copyright 2021 Author(s), licensed under a Creative Commons Attribution 4.0 License. (c) Schematic circuit for spatiotemporal recognition, where the EPSC is obtained as the comparison of excitatory and inhibitory synaptic currents. Reproduced with permission from Wang et al., Adv. Intell. Syst. 3(4), 2000224 (2020). Copyright 2020 Author(s), licensed under a Creative Commons Attribution 4.0 License. (d) Measured EPSC for the case of preferred sequence A–B in (c), resulting in a positive current. Reproduced with permission from Wang et al., Adv. Intell. Syst. 3(4), 2000224 (2020). Copyright 2020 Author(s), licensed under a Creative Commons Attribution 4.0 License. (e) Measured EPSC for the case of non-preferred sequence B–A in (c), resulting in a negative current. Reproduced with permission from Wang et al., Adv. Intell. Syst. 3(4), 2000224 (2020). Copyright 2020 Author(s), licensed under a Creative Commons Attribution 4.0 License.

Close modal

Figure 16(c) shows an example of spatiotemporal pattern recognition via volatile RRAM.219 Two volatile synapses, serving as excitatory and inhibitory synapses, respectively, are stimulated by spikes A and B. Each synapse consists of several Ag-based volatile RRAM devices, where the spike stimulation and the persistent current cause an overall exponentially decaying response of each synapse as a result of Kirchhoff’s law summation of each RRAM current contribution. The excitatory current Iexc and the inhibitory current Iinh are subtracted from each other to yield the excitatory postsynaptic current (EPSC) given by IEPSC = IexcIinh. Figures 16(d) and 16(e) show the synaptic currents and the EPSC for the case of the preferred sequence, namely, A–B, and the non-preferred sequence, namely, B-A. Due to the delay between the synaptic currents, the preferred sequence yields a positive EPSC, while the non-preferred sequence yields a negative EPSC. By comparing the EPSC with a threshold current, e.g., Ith = 2.5 μA in Figs. 16(d) and 16(e) allows us to easily discriminate between the two patterns. This concept was applied to realize a retina-inspired artificial vision system capable of motion detection. In the biological retina, motion detection is achieved by direction-selective (DS) ganglion cells,220 where excitatory and inhibitory synapses occupy adjacent areas within the receptive field [Fig. 16(c)]. An image moving across the ganglion cell might stimulate the excitatory synapses followed by the inhibitory synapses, or vice versa, depending on the direction [Figs. 16(d) and 16(e)]. The EPSC of the ganglion cell thus allows us to recognize the direction of the image. The same concept can be extended to multiple directions by mimicking the starburst amacrine cell (SAC) structure in the retina, thus enabling a fast, low-power direction sensitivity in the analog domain.219,221

Reservoir computing (RC) is a modern machine learning technique, which is particularly suited to temporal/sequential information processing.222, Figure 17(a) schematically shows the RC concept, which was originally conceived as an alternative approach to recurrent neural network (RNN) design and training, such as liquid state machines223 and echo state networks.224 In general, an RC network transforms sequential input data into a high-dimensional dynamical state via a reservoir layer. The output of the reservoir network is then processed by a readout layer to provide recognition and classification. The reservoir layer generally features random weights and connections, thus limiting the need for training to the readout layer and overcoming the complexity of multi-layer gradient-descent training techniques. Hardware RC networks are attracting interest thanks to their potential in energy efficiency, high versatility, and fast learning.225–227 

FIG. 17.

Reservoir computing (RC) based on volatile memory devices. (a) Conceptual scheme of an RC system, composed of a random reservoir layer and a trained readout layer. Adapted from the work of Tanaka et al., Neural Networks 115, 100–123 (2019). Copyright 2019 Author(s), licensed under a Creative Commons Attribution 4.0 License. (b) RC system for handwritten digit recognition. The image is converted into a spatiotemporal pattern fed to the memory-based reservoir layer. The readout network processes the reservoir states for classification. Reproduced with permission from Du et al., Nat. Commun. 8(1), 2204 (2017). Copyright 2015 Author(s), licensed under a Creative Commons Attribution 4.0 License. (c) Illustration of the MoS2-based charge trap memory (CTM). Reproduced with permission from Farronato et al., Adv. Mater. (published online) (2022). Copyright 2022 Author(s), licensed under a Creative Commons Attribution 4.0 License. (d) Measured characteristics of a MoS2-based CTM device showing pulse-induced potentiation followed by spontaneous decay. Reproduced with permission from Farronato et al., Adv. Mater. (published online) (2022). Copyright 2022 Author(s), licensed under a Creative Commons Attribution 4.0 License. (e) Input patterns and corresponding reservoir states for a MoS2-based reservoir layer. Reproduced with permission from Farronato et al., Adv. Mater. (published online) (2022). Copyright 2022 Author(s), licensed under a Creative Commons Attribution 4.0 License. (f) Confusion matrix for the MoS2-based RC system, demonstrating the classification results for digit images. Reproduced with permission from Farronato et al., Adv. Mater. (published online) (2022). Copyright 2022 Author(s), licensed under a Creative Commons Attribution 4.0 License.

FIG. 17.

Reservoir computing (RC) based on volatile memory devices. (a) Conceptual scheme of an RC system, composed of a random reservoir layer and a trained readout layer. Adapted from the work of Tanaka et al., Neural Networks 115, 100–123 (2019). Copyright 2019 Author(s), licensed under a Creative Commons Attribution 4.0 License. (b) RC system for handwritten digit recognition. The image is converted into a spatiotemporal pattern fed to the memory-based reservoir layer. The readout network processes the reservoir states for classification. Reproduced with permission from Du et al., Nat. Commun. 8(1), 2204 (2017). Copyright 2015 Author(s), licensed under a Creative Commons Attribution 4.0 License. (c) Illustration of the MoS2-based charge trap memory (CTM). Reproduced with permission from Farronato et al., Adv. Mater. (published online) (2022). Copyright 2022 Author(s), licensed under a Creative Commons Attribution 4.0 License. (d) Measured characteristics of a MoS2-based CTM device showing pulse-induced potentiation followed by spontaneous decay. Reproduced with permission from Farronato et al., Adv. Mater. (published online) (2022). Copyright 2022 Author(s), licensed under a Creative Commons Attribution 4.0 License. (e) Input patterns and corresponding reservoir states for a MoS2-based reservoir layer. Reproduced with permission from Farronato et al., Adv. Mater. (published online) (2022). Copyright 2022 Author(s), licensed under a Creative Commons Attribution 4.0 License. (f) Confusion matrix for the MoS2-based RC system, demonstrating the classification results for digit images. Reproduced with permission from Farronato et al., Adv. Mater. (published online) (2022). Copyright 2022 Author(s), licensed under a Creative Commons Attribution 4.0 License.

Close modal

Figure 17(b) schematically shows an IMC-based RC network for image recognition.228 First, the input pattern, e.g., the image of a handwritten digit, is converted into a spatiotemporal pattern, where rows represent the sequential spikes and columns represent the N input channels. The resulting spatiotemporal is fed to N volatile RRAM devices where the STM response provides a physical reservoir layer. The dynamic reservoir layer yields a unique output response, e.g., the output transient current, to each input pattern, which can then be classified by the readout layer, consisting of a properly trained fully connected network.

RC was demonstrated by using charge-trap memory (CTM) devices based on a MoS2-based channel.103, Figure 17(c) shows the device structure with source/drain contacts deposited on a MoS2 channel, where inversion and depletion were controlled by a back gate. In this device, a positive or negative gate voltage results in the trapping of electrons or holes, respectively, at the interface between MoS2 and SiO2, the latter serving as gate dielectric layer. Electron/hole trappings cause a shift of threshold voltage, thus resulting in a change in the channel conductivity. This is shown in Fig. 17(d), where a train of negative gate pulses leads to an increase in conductance, which spontaneously decays at the end of the stimulation. The dynamic response in Fig. 17(d) was used as a physical reservoir process in an RC network for image recognition with 5 CTM devices as the reservoir layer.103, Figure 17(e) shows examples of the reservoir output, indicating potentiation and spontaneous decay as a result of the spatiotemporal stimulation. After training the readout layer by the logistic regression,157 a good classification accuracy was achieved, as shown by the confusion diagram in Fig. 17(f). Compared to DNNs, RC networks employ fewer devices by leveraging the rich analog, dynamic response of the CTM device, thus resulting in a significantly smaller classification network.229 In addition, power consumption can be minimized in the RC layer by operating the CTM device in the subthreshold regime.103 Similar spatiotemporal RC networks were used for solving second-order nonlinear equations,228 spoken-digit recognition,229 and autonomous chaotic time-series forecasting,229 thus supporting the wide application scenario for RC-based IMC circuits.

The principle of using device physics to achieve smart computing functions is further extended from devices to materials in the so-called in-materia computing.230,231 In-materia computing relies on the ability of certain materials, such nanoparticles, nanostructures, or even randomly-doped semiconductors, to act as a distributed, random network of physical dynamical nodes for computation.232 In-materia computing systems include nanostructures based on carbon nanotubes (CNTs),233,234 nanowires (NWs),235–237 and metallic nanoparticles.238 Indeed, programming, stimulating, and controlling the individual nodes in the computing materials are a challenging task since the materials can exhibit dynamic fluctuations.239,240 However, nanostructures are ideally suited to serve as the randomly connected reservoir layer of an RC network.225,236 Figure 18(a) shows a fully memristive RC system where the RC layer is made of a network of silver nanowires (NWs), which is shown in Fig. 18(b).236 The electrical stimulation of the NW network induces a change in the NW cross-point junctions,235 thus resulting in a dynamic potentiation of the local connection, hence the local effective conductance. The output of the reservoir, i.e., the output current or the node potential of the NW network, is then processed by the readout layer, e.g., a fully connected network of RRAM devices. By properly training the readout network, tasks such as image recognition and spatiotemporal pattern prediction can be carried out.236 This approach to computation has distinct advantages in terms of scaling and easy manufacturing thanks to the bottom-up technology for developing the physical NW network. Figure 18(c) shows a neuromorphic device composed of a single-walled carbon nanotube (SWCNT) complexed with polyoxometalate (POM).234,241 When arranged in a network, SWCNT can spontaneously generate spikes and noise thanks to multi-redox activities at the crossing points.242 Both periodic and aperiodic current spikes are generated under a constant-voltage bias, as shown in Fig. 18(d). The applied bias causes the conductance to switch between POMs and SWCNTs, thus mimicking the potentiation behavior of a neurobiological synapse. Chemical reaction phenomena, such as aggregation and dissociation of counter-cations, play an additional role, thus leading to spike generation. Similar to the NW network of Fig. 18(b), the POM/SWCNT network can serve as a reservoir layer in an RC system thanks to its nonlinear dynamic.234 

FIG. 18.

In-materia neuromorphic computing. (a) Schematic of a general RC network with a random reservoir layer and a properly trained readout network for the recognition of spatiotemporal patterns. Reproduced with permission from Milano et al., Nat. Mater. 21(2), 195–202 (2022). Copyright 2021 Springer Nature Limited. (b) SEM image of a memristive nanowire network used as the reservoir layer. Scale bar is 2 μm. Reproduced with permission from Milano et al., Nat. Mater. 21(2), 195–202 (2022). Copyright 2021 Springer Nature Limited. (c) Atomic force microscopy (AFM) image of a single-walled carbon nanotube (SWCNT) within a SWCNT-based transistor. Reproduced with permission from Tanaka et al., Nat. Commun. 9(1), 2693 (2018). Copyright 2015 Author(s), licensed under a Creative Commons Attribution 4.0 License. (d) Measured response of a SWCNT transistor, including noisy and periodic dynamics. Reproduced with permission from Tanaka et al., Nat. Commun. 9(1), 2693 (2018). Copyright 2015 Author(s), licensed under a Creative Commons Attribution 4.0 License. (e) Schematic of a neurobiological model with two pre-synaptic spikes and integrate-and-fire neurons. Reproduced with permission from Shen et al., ACS Nano 7(7), 6117–6122 (2013). Copyright 2013 American Chemical Society. (f) Synaptic transistor based on an SWCNT network. Reproduced with permission from Shen et al., ACS Nano 7(7), 6117–6122 (2013). Copyright 2013 American Chemical Society. (g) AFM image of a random SWCNT network in the transistor channel. Reproduced with permission from Shen et al., ACS Nano 7(7), 6117–6122 (2013). Copyright 2013 American Chemical Society.

FIG. 18.

In-materia neuromorphic computing. (a) Schematic of a general RC network with a random reservoir layer and a properly trained readout network for the recognition of spatiotemporal patterns. Reproduced with permission from Milano et al., Nat. Mater. 21(2), 195–202 (2022). Copyright 2021 Springer Nature Limited. (b) SEM image of a memristive nanowire network used as the reservoir layer. Scale bar is 2 μm. Reproduced with permission from Milano et al., Nat. Mater. 21(2), 195–202 (2022). Copyright 2021 Springer Nature Limited. (c) Atomic force microscopy (AFM) image of a single-walled carbon nanotube (SWCNT) within a SWCNT-based transistor. Reproduced with permission from Tanaka et al., Nat. Commun. 9(1), 2693 (2018). Copyright 2015 Author(s), licensed under a Creative Commons Attribution 4.0 License. (d) Measured response of a SWCNT transistor, including noisy and periodic dynamics. Reproduced with permission from Tanaka et al., Nat. Commun. 9(1), 2693 (2018). Copyright 2015 Author(s), licensed under a Creative Commons Attribution 4.0 License. (e) Schematic of a neurobiological model with two pre-synaptic spikes and integrate-and-fire neurons. Reproduced with permission from Shen et al., ACS Nano 7(7), 6117–6122 (2013). Copyright 2013 American Chemical Society. (f) Synaptic transistor based on an SWCNT network. Reproduced with permission from Shen et al., ACS Nano 7(7), 6117–6122 (2013). Copyright 2013 American Chemical Society. (g) AFM image of a random SWCNT network in the transistor channel. Reproduced with permission from Shen et al., ACS Nano 7(7), 6117–6122 (2013). Copyright 2013 American Chemical Society.

Close modal

SWCNT networks were also used as analog synapses in the neuromorphic module of Fig. 18(e).233 The module consists of a single neuron connected with other neurons through synapses. The synapses are emulated by transistors based on a random CNT network, while the axon in the neuron is realized by Si-based transistors. Figure 18(f) shows the CNTs-based synaptic transistor, with the random SWCNT network in the inset. Electron trapping in the dielectric layer due to the application of gate pulses results in an increase of current in the p-type SWCNT channel. Potentiation is followed by decay due to the tunneling of electrons out from the dielectric layer. The SWCNT-based synapse also shows inhibitory characteristics under the negative voltage of the gate. Potentiation/depression allows for the emulation of biological STDP and PPF, which is promising for the development of in-materia neuromorphic computing systems.

The main enablers of IMC are emerging memory devices, whose distinct advantages, such as nonvolatile behavior, make them more appealing than SRAM243 or DRAM244 although at the expense of increased programming energy and times.245,246 For tasks where computational parameters must be frequently updated, such as stateful Boolean logic circuits,96,97 the programming overhead may overshadow the advantages of IMC. Moreover, given the fundamentally different characteristics of emerging memories in terms of linearity, power consumption, conductance window, noise, and CMOS compatibility,245,247,248 it is difficult to identify a best-in-class technology with universal applicability across all IMC applications.100,134,170,249–253 As an example, combinatorial optimization tasks254–256 inherently require controllable, device-level randomness148 as an enabling feature for simulated annealing.106 On the other hand, scientific computing applications show extremely narrow tolerance to perturbation and noise,257 relying on high-precision data storage to provide high-quality results.249 The search for a universal memory, capable of satisfying the requirements of many applications at the same time, is thus still open. One of the main pathways for the implementation of in-memory computing is the reduction of the power consumption of memory devices to allow for the operation of extremely large arrays with an affordable cost. Another key challenge is the improvement of reliability, e.g., the realization of self-selecting, multilevel memory devices with a large endurance and low variability. At the present time, these requirements can be partially solved by proper programming approaches (program and verify algorithms) or device implementation (1T1R structures, etc.) at the cost of operation slowness and decrease of integration density.

Many of the advantages of IMC derive from the collective behavior of densely packed memory cells in an array configuration. Common parasitics, such as line resistance and capacitance,258 can limit the accuracy of both write and read operations, thus affecting the reliability of IMC.114,259 While selector devices alleviate the issue during the programming phase, they have limited impact during computation as all cells are simultaneously selected. Schemes for parasitic compensation164,260,261 may help mitigate the issue at the expense of increased pre-processing overhead and reduced effectiveness for large array size. For error-tolerant or adaptive applications, optimization frameworks can be developed262,263 with negligible loss of accuracy. Another approach is to use three-terminal devices with ultra-low conductance, such as ECRAM and MoS2 CTM devices,175 to minimize both the IR drop and the line capacitances of the array. However, large-scale crosspoint arrays of two-terminal devices have been exhaustively demonstrated in academia and industry,264–268 whereas the same maturity level is currently lacking for arrays of three-terminal emerging memory devices.76,80

Power consumption is another key consideration imposing constraints on the individual array size.122,269,270 Power can be handled by arranging the IMC system with tiled architecture7 where multiple replicas of a fundamental computing macro, or core, work in parallel for the execution of a computing task. Core architecture design is another open quest in the field of IMC, where computational efficiency and robustness must be balanced with analog-to-digital and digital-to-analog conversion overheads.248 On the one hand, IMC-specific conversion front-ends271,272 should balance accuracy, latency, energy, and area consumption. On the other hand, various approaches to data encoding, such as amplitude modulation134,273 or pulse-width modulation,136 require conversion circuits to be flexible and reconfigurable. Finally, proper design of the inter-core communication is crucial to maintain the IMC advantage and allow for the solution of large-scale problems.274 Co-optimization of the device, architecture, and application seems to be the most promising concept to fully unleash the IMC potential in overcoming the von Neumann bottleneck.269,275

Finally, to allow for widespread IMC adoption, it is essential to bridge the gap between hardware and software by implementing an electronic design automation (EDA) toolchain. On the one hand, IMC-specific design tools276 are useful for system designers and engineers to develop large-scale, highly accurate IMC hardware and software systems. On the other hand, end users operating at a higher level of abstraction need a software stack capable of transparently compiling a given problem for a target IMC architecture optimization.277–279 This challenge should be tackled by the codesign and co-development of a full set of hardware and software tools to elevate the maturity of IMC for real-life applications.

This Perspective provides a review of the status and outlook of IMC with emerging memory devices. The candidate alternatives to the conventional von Neumann architecture are presented and compared in terms of their degree of integration between memory and computing units. Two-terminal and three-terminal emerging memory devices are reviewed. By distinguishing two general operating regimes of emerging devices, low-voltage static IMC and high-voltage dynamic IMC are identified as the main IMC macro-categories. Correspondingly, the most relevant computing primitives are explored in view of their real-world applications. For static IMC, MVM and IMVM accelerators, as well as TCAMs, are presented together with their applications in machine learning, hardware security, and data classification. Similarly, for dynamic IMC, outer-product accelerators for neural network training and brain-inspired systems for reservoir computing are discussed. Finally, challenges for the in silico implementation of an IMC architecture are outlined. Owing to the overarching nature of IMC, encompassing device, computing core, and the EDA toolchain, a strongly multidisciplinary approach is needed to co-optimize all components and fully unleash the IMC potential.

This work received funding from the Italian Ministry of University and Research (MUR) and the European Union (EU) under the PON/REACT program and the Horizon 2020 Research and Innovation program (Grant Agreement Nos. 824164 and 899559). This work also received funding from ECSEL Joint Undertaking (JU) under Grant Agreement No. 101007321. The JU receives support from the European Union’s Horizon 2020 Research and Innovation program and France, Belgium, Czech Republic, Germany, Italy, Sweden, Switzerland, and Turkey.

The authors have no conflicts to disclose.

P.M. and M.F. contributed equally to this work.

P. Mannocci: Writing – original draft (lead). M. Farronato: Writing – original draft (lead). N. Lepri: Writing – original draft (equal). L. Cattaneo: Writing – original draft (equal). A. Glukhov: Writing – original draft (equal). Z. Sun: Writing – original draft (equal). D. Ielmini: Conceptualization (lead); Funding acquisition (lead); Project administration (lead); Resources (lead); Supervision (lead); Writing – review & editing (lead).

The data that support the findings of this study are openly available in https://zenodo.org/record/7378087#.Y4Y8xHbMKCo.

1.
W. A.
Wulf
and
S. A.
McKee
, “
Hitting the memory wall: Implications of the obvious
,”
ACM SIGARCH Comput. Archit. News
23
(
1
),
20
24
(
1995
).
2.
M.
Horowitz
, “
1.1 Computing’s energy problem (and what we can do about it)
,” in
2014 IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC)
(
IEEE
,
San Francisco, CA
,
2014
), pp.
10
14
.
3.
H.
Jun
,
J.
Cho
et al, “
HBM (high bandwidth memory) DRAM technology and architecture
,” in
2017 IEEE International Memory Workshop (IMW)
(
IEEE
,
Monterey, CA
,
2017
), pp.
1
4
.
4.
J.
Jeddeloh
and
B.
Keeth
, “
Hybrid memory cube new DRAM architecture increases density and performance
,” in
2012 Symposium on VLSI Technology (VLSIT)
(
IEEE
,
Honolulu, HI
,
2012
), pp.
87
88
.
5.
D.
Patterson
,
T.
Anderson
et al, “
A case for intelligent RAM
,”
IEEE Micro
17
(
2
),
34
44
(
1997
).
6.
M. A.
Zidan
,
J. P.
Strachan
, and
W. D.
Lu
, “
The future of electronics based on memristive systems
,”
Nat. Electron.
1
(
1
),
22
29
(
2018
).
7.
D.
Ielmini
and
H.-S. P.
Wong
, “
In-memory computing with resistive switching devices
,”
Nat. Electron.
1
(
6
),
333
343
(
2018
).
8.
S.
Mittal
,
G.
Verma
et al, “
A survey of SRAM-based in-memory computing techniques and applications
,”
J. Syst. Archit.
119
,
102276
(
2021
).
9.
H.-S. P.
Wong
and
S.
Salahuddin
, “
Memory leads the way to better computing
,”
Nat. Nanotechnol.
10
(
3
),
191
194
(
2015
).
10.
J.
Wang
,
X.
Wang
et al, “
A 28-nm compute SRAM with bit-serial logic/arithmetic operations for programmable in-memory vector computing
,”
IEEE J. Solid-State Circuits
55
(
1
),
76
86
(
2020
).
11.
W.
Wang
,
W.
Song
et al, “
Integration and co-design of memristive devices and algorithms for artificial intelligence
,”
iScience
23
(
12
),
101809
(
2020
).
12.
C. M.
Compagnoni
,
A.
Goda
et al, “
Reviewing the evolution of the NAND flash technology
,”
Proc. IEEE
105
(
9
),
1609
1633
(
2017
).
13.
R.
Micheloni
,
L.
Crippa
et al,
Inside NAND Flash Memories
(
Springer Science & Business Media
,
2010
).
14.
B.
Govoreanu
,
G. S.
Kar
et al, “
10×10nm2 Hf/HfOx crossbar resistive RAM with excellent performance, reliability and low-energy operation
,” in
2011 IEEE International Electron Devices Meeting (IEDM)
(
IEEE
,
Washington, DC
,
2011
), pp.
31.6.1
31.6.4
.
15.
C.-W.
Hsu
,
C.-C.
Wan
et al, “
3D vertical TaOx/TiO2 RRAM with over 103 self-rectifying ratio and sub-μA operating current
,” in
2013 IEEE International Electron Devices Meeting (IEDM)
(
IEEE
,
Washington, DC
,
2013
), pp.
10.4.1
10.4.4
.
16.
Q.
Luo
,
X.
Xu
et al, “
8-layers 3D vertical RRAM with excellent scalability towards storage class memory applications
,” in
2017 IEEE International Electron Devices Meeting (IEDM)
(
IEEE
,
San Francisco, CA
,
2017
), pp.
2.7.1
2.7.4
.
17.
J. A.
Kittl
,
K.
Opsomer
et al, “
High-k dielectrics for future generation memory devices
,”
Microelectron. Eng.
86
(
7–9
),
1789
1795
(
2009
).
18.
D.
Ielmini
and
S.
Ambrogio
, “
Emerging neuromorphic devices
,”
Nanotechnology
31
(
9
),
092001
(
2020
).
19.
S.
Brivio
,
S.
Spiga
, and
D.
Ielmini
, “
HfO2-based resistive switching memory devices for neuromorphic computing
,”
Neuromorph. Comput. Eng.
2
,
042001
(
2022
).
20.
A.
Bricalli
,
E.
Ambrosi
et al, “
Resistive switching device technology based on silicon oxide for improved ON–OFF ratio—Part II: Select devices
,”
IEEE Trans. Electron Devices
65
(
1
),
122
128
(
2018
).
21.
W.-G.
Kim
and
S.-W.
Rhee
, “
Effect of the top electrode material on the resistive switching of TiO2 thin film
,”
Microelectron. Eng.
87
(
2
),
98
103
(
2010
).
22.
H.
Akinaga
and
H.
Shima
, “
Resistive random access memory (ReRAM) based on metal oxides
,”
Proc. IEEE
98
(
12
),
2237
2251
(
2010
).
23.
Z.
Zhang
,
B.
Gao
et al, “
All-metal-nitride RRAM devices
,”
IEEE Electron Device Lett.
36
(
1
),
29
31
(
2015
).
24.
J.
Chen
,
C.-Y.
Lin
et al, “
LiSiOx-based analog memristive synapse for neuromorphic computing
,”
IEEE Electron Device Lett.
40
(
4
),
542
545
(
2019
).
25.
U.
Russo
,
D.
Kamalanathan
et al, “
Study of multilevel programming in programmable metallization cell (PMC) memory
,”
IEEE Trans. Electron Devices
56
(
5
),
1040
1047
(
2009
).
26.
C.
Pan
,
Y.
Ji
et al, “
Coexistence of grain-boundaries-assisted bipolar and threshold resistive switching in multilayer hexagonal boron nitride
,”
Adv. Funct. Mater.
27
(
10
),
1604811
(
2017
).
27.
Y.
Shen
,
W.
Zheng
et al, “
Variability and yield in h-BN-based memristive circuits: The role of each type of defect
,”
Adv. Mater.
33
(
41
),
2103656
(
2021
).
28.
S.
Goswami
,
A. J.
Matula
et al, “
Robust resistive memory devices using solution-processable metal-coordinated azo aromatics
,”
Nat. Mater.
16
(
12
),
1216
1224
(
2017
).
29.
S.
Goswami
,
S. P.
Rath
et al, “
Charge disproportionate molecular redox for discrete memristive and memcapacitive switching
,”
Nat. Nanotechnol.
15
(
5
),
380
389
(
2020
).
30.
S.
Goswami
,
R.
Pramanick
et al, “
Decision trees within a molecular memristor
,”
Nature
597
(
7874
),
51
56
(
2021
).
31.
D.
Ielmini
,
R.
Bruchhaus
, and
R.
Waser
, “
Thermochemical resistive switching: Materials, mechanisms, and scaling projections
,”
Phase Transitions
84
(
7
),
570
602
(
2011
).
32.
H. Y.
Lee
,
P. S.
Chen
et al, “
Low power and high speed bipolar switching with a thin reactive Ti buffer layer in robust HfO2 based RRAM
,” in
2008 IEEE International Electron Devices Meeting (IEDM)
(
IEEE
,
San Francisco, CA
,
2008
), pp.
1
4
.
33.
A.
Sawa
, “
Resistive switching in transition metal oxides
,”
Mater. Today
11
(
6
),
28
36
(
2008
).
34.
S.
Raoux
,
W.
Wełnic
, and
D.
Ielmini
, “
Phase change materials and their application to nonvolatile memories
,”
Chem. Rev.
110
(
1
),
240
267
(
2010
).
35.
G. W.
Burr
,
M. J.
Breitwisch
et al, “
Phase change memory technology
,”
J. Vac. Sci. Technol. B
28
(
2
),
223
262
(
2010
).
36.
D.
Ielmini
and
A. L.
Lacaita
, “
Phase change materials in non-volatile storage
,”
Mater. Today
14
(
12
),
600
607
(
2011
).
37.
M.
Wuttig
and
N.
Yamada
, “
Phase-change materials for rewriteable data storage
,”
Nat. Mater.
6
(
11
),
824
832
(
2007
).
38.
D.
Ielmini
,
A. L.
Lacaita
et al, “
Analysis of phase distribution in phase-change nonvolatile memories
,”
IEEE Electron Device Lett.
25
(
7
),
507
509
(
2004
).
39.
U.
Russo
,
D.
Ielmini
, and
A. L.
Lacaita
, “
Analytical modeling of chalcogenide crystallization for PCM data-retention extrapolation
,”
IEEE Trans. Electron Devices
54
(
10
),
2769
2777
(
2007
).
40.
D.
Ielmini
,
A. L.
Lacaita
, and
D.
Mantegazza
, “
Recovery and drift dynamics of resistance and threshold voltages in phase-change memories
,”
IEEE Trans. Electron Devices
54
(
2
),
308
315
(
2007
).
41.
P.
Zuliani
,
E.
Varesi
et al, “
Overcoming temperature limitations in phase change memories with optimized GexSbyTez
,”
IEEE Trans. Electron Devices
60
(
12
),
4020
4026
(
2013
).
42.
S.
Kim
,
N.
Sosa
et al, “
A phase change memory cell with metallic surfactant layer as a resistance drift stabilizer
,” in
2013 IEEE International Electron Devices Meeting (IEDM)
(
IEEE
,
Washington, DC
,
2013
), pp.
30.7.1
30.7.4
.
43.
F.
Arnaud
,
P.
Zuliani
et al, “
Truly innovative 28nm FDSOI technology for automotive micro-controller applications embedding 16MB phase change memory
,” in
2018 IEEE International Electron Devices Meeting (IEDM)
(
IEEE
,
San Francisco, CA
,
2018
), pp.
18.4.1
18.4.4
.
44.
D.
Min
,
J.
Park
et al, “
18nm FDSOI technology platform embedding PCM & innovative continuous-active construct enhancing performance for leading-edge MCU applications
,” in
2021 IEEE International Electron Devices Meeting (IEDM)
(
IEEE
,
San Francisco, CA
,
2021
), pp.
13.1.1
13.1.4
.
45.
P.
Zuliani
,
A.
Conte
, and
P.
Cappelletti
, “
The PCM way for embedded non volatile memories applications
,” in
2019 Symposium on VLSI Technology
(
IEEE
,
Kyoto, Japan
,
2019
), pp.
T192
T193
.
46.
T.
Mikolajick
,
C.
Dehm
et al, “
FeRAM technology for high density applications
,”
Microelectron. Reliab.
41
(
7
),
947
950
(
2001
).
47.
D. J.
Kim
,
J. Y.
Jo
et al, “
Polarization relaxation induced by a depolarization field in ultrathin ferroelectric BaTiO3 capacitors
,”
Phys. Rev. Lett.
95
(
23
),
237602
(
2005
).
48.
J. F.
Scott
, “
Applications of modern ferroelectrics
,”
Science
315
(
5814
),
954
959
(
2007
).
49.
T. S.
Böscke
,
J.
Müller
et al, “
Ferroelectricity in hafnium oxide thin films
,”
Appl. Phys. Lett.
99
(
10
),
102903
(
2011
).
50.
J.-M.
Koo
,
B.-S.
Seo
et al, “
Fabrication of 3D trench PZT capacitors for 256Mbit FRAM device application
,” in
2005 IEEE International Electron Devices Meeting (IEDM). IEDM Technical Digest
(
IEEE
,
2005
), pp.
340
343
.
51.
P.
Polakowski
,
S.
Riedel
et al, “
Ferroelectric deep trench capacitors based on Al:HfO2 for 3D nonvolatile memory applications
,” in
2014 IEEE 6th International Memory Workshop (IMW)
(
IEEE
,
2014
), pp.
1
4
.
52.
M.
Pešić
,
F. P. G.
Fengler
et al, “
Physical mechanisms behind the field-cycling behavior of HfO2-based ferroelectric capacitors
,”
Adv. Funct. Mater.
26
(
25
),
4601
4612
(
2016
).
53.
A.
Chanthbouala
,
A.
Crassous
et al, “
Solid-state memories based on ferroelectric tunnel junctions
,”
Nat. Nanotechnol.
7
(
2
),
101
104
(
2012
).
54.
D.
Wang
,
C.
Nordman
et al, “
70% TMR at room temperature for SDT sandwich junctions with CoFeB as free and reference layers
,”
IEEE Trans. Magn.
40
(
4
),
2269
2271
(
2004
).
55.
C.
Chappert
,
A.
Fert
, and
F. N.
Van Dau
, “
The emergence of spin electronics in data storage
,”
Nat. Mater.
6
(
11
),
813
823
(
2007
).
56.
J. C.
Sankey
,
Y.-T.
Cui
et al, “
Measurement of the spin-transfer-torque vector in magnetic tunnel junctions
,”
Nat. Phys.
4
(
1
),
67
71
(
2008
).
57.
S.
Ikeda
,
K.
Miura
et al, “
A perpendicular-anisotropy CoFeB–MgO magnetic tunnel junction
,”
Nat. Mater.
9
(
9
),
721
724
(
2010
).
58.
S.
Sakhare
,
M.
Perumkunnil
et al, “
Enablement of STT-MRAM as last level cache for the high performance computing domain at the 5nm node
,” in
2018 IEEE International Electron Devices Meeting (IEDM)
(
IEEE
,
San Francisco, CA
,
2018
), pp.
18.3.1
18.3.4
.
59.
J.
Grollier
,
D.
Querlioz
, and
M. D.
Stiles
, “
Spintronic nanodevices for bioinspired computing
,”
Proc. IEEE
104
(
10
),
2024
2039
(
2016
).
60.
A. I.
Khan
,
A.
Keshavarzi
, and
S.
Datta
, “
The future of ferroelectric field-effect transistor technology
,”
Nat. Electron.
3
(
10
),
588
597
(
2020
).
61.
K.
Sugibuchi
,
Y.
Kurogi
, and
N.
Endo
, “
Ferroelectric field-effect memory device using Bi4Ti3O12 film
,”
J. Appl. Phys.
46
(
7
),
2877
2881
(
1975
).
62.
K.
Florent
,
M.
Pesic
et al, “
Vertical ferroelectric HfO2 FET based on 3-D NAND architecture: Towards dense low-power memory
,” in
2018 IEEE International Electron Devices Meeting (IEDM)
(
IEEE
,
San Francisco, CA
,
2018
), pp.
2.5.1
2.5.4
.
63.
K. A.
Aabrar
,
J.
Gomez
et al, “
BEOL compatible superlattice FerroFET-based high precision analog weight cell with superior linearity and symmetry
,” in
2021 IEEE International Electron Devices Meeting (IEDM)
(
IEEE
,
2021
), pp.
19.6.1
19.6.4
.
64.
I. M.
Miron
,
K.
Garello
et al, “
Perpendicular switching of a single ferromagnetic layer induced by in-plane current injection
,”
Nature
476
(
7359
),
189
193
(
2011
).
65.
K.
Garello
,
C. O.
Avci
et al, “
Ultrafast magnetization switching by spin-orbit torques
,”
Appl. Phys. Lett.
105
(
21
),
212402
(
2014
).
66.
T.
Endoh
,
H.
Honjo
et al, “
Recent progresses in STT-MRAM and SOT-MRAM for next generation MRAM
,” in
2020 IEEE Symposium on VLSI Technology
(
IEEE
,
Honolulu, HI
,
2020
), pp.
1
2
.
67.
H.
Wu
,
J.
Zhang
et al, “
Field-free approaches for deterministic spin–orbit torque switching of the perpendicular magnet
,”
Mater. Futures
1
(
2
),
022201
(
2022
).
68.
E. J.
Fuller
,
S. T.
Keene
et al, “
Parallel programming of an ionic floating-gate memory array for scalable neuromorphic computing
,”
Science
364
(
6440
),
570
574
(
2019
).
69.
J.
Tang
,
D.
Bishop
et al, “
ECRAM as scalable synaptic cell for high-speed, low-power neuromorphic computing
,” in
2018 IEEE International Electron Devices Meeting (IEDM)
(
IEEE
,
San Francisco, CA
,
2018
), pp.
13.1.1
13.1.4
.
70.
S.
Kim
,
T.
Todorov
et al, “
Metal-oxide based, CMOS-compatible ECRAM for deep learning accelerator
,” in
2019 IEEE International Electron Devices Meeting (IEDM)
(
IEEE
,
San Francisco, CA
,
2019
), pp.
35.7.1
35.7.4
.
71.
Y.
Li
,
E. J.
Fuller
et al, “
Filament-free bulk resistive memory enables deterministic analogue switching
,”
Adv. Mater.
32
(
45
),
2003984
(
2020
).
72.
E. J.
Fuller
,
F. E.
Gabaly
et al, “
Li-ion synaptic transistor for low power analog computing
,”
Adv. Mater.
29
(
4
),
1604310
(
2017
).
73.
Y.
van de Burgt
,
E.
Lubberman
et al, “
A non-volatile organic electrochemical device as a low-voltage artificial synapse for neuromorphic computing
,”
Nat. Mater.
16
(
4
),
414
418
(
2017
).
74.
M.
Berggren
,
X.
Crispin
et al, “
Ion electron–coupled functionality in materials and devices based on conjugated polymers
,”
Adv. Mater.
31
(
22
),
1805813
(
2019
).
75.
P. C.
Harikesh
,
C.-Y.
Yang
et al, “
Organic electrochemical neurons and synapses with ion mediated spiking
,”
Nat. Commun.
13
(
1
),
901
(
2022
).
76.
H.
Lee
,
D. G.
Ryu
et al, “
Vertical metal-oxide electrochemical memory for high-density synaptic array based high-performance neuromorphic computing
,”
Adv. Electron. Mater.
8
(
8
),
2200378
(
2022
).
77.
V. K.
Sangwan
,
D.
Jariwala
et al, “
Gate-tunable memristive phenomena mediated by grain boundaries in single-layer MoS2
,”
Nat. Nanotechnol.
10
(
5
),
403
406
(
2015
).
78.
V. K.
Sangwan
,
H.-S.
Lee
et al, “
Multi-terminal memtransistors from polycrystalline monolayer molybdenum disulfide
,”
Nature
554
(
7693
),
500
504
(
2018
).
79.
M.
Farronato
,
M.
Melegari
et al, “
Memtransistor devices based on MoS2 multilayers with volatile switching due to Ag cation migration
,”
Adv. Electron. Mater.
8
(
8
),
2101161
(
2022
).
80.
H. S.
Lee
,
V. K.
Sangwan
et al, “
Dual-gated MoS2 memtransistor crossbar array
,”
Adv. Funct. Mater.
30
(
45
),
2003683
(
2020
).
81.
S.
Hao
,
X.
Ji
et al, “
A monolayer leaky integrate-and-fire neuron for 2D memristive neuromorphic networks
,”
Adv. Electron. Mater.
6
(
4
),
1901335
(
2020
).
82.
R. A.
John
,
F.
Liu
et al, “
Synergistic gating of electro-iono-photoactive 2D chalcogenide neuristors: Coexistence of hebbian and homeostatic synaptic metaplasticity
,”
Adv. Mater.
30
(
25
),
1800220
(
2018
).
83.
R. A.
John
,
J.
Acharya
et al, “
Optogenetics inspired transition metal dichalcogenide neuristors for in-memory deep recurrent neural networks
,”
Nat. Commun.
11
(
1
),
3211
(
2020
).
84.
E.
Fortunato
,
P.
Barquinha
, and
R.
Martins
, “
Oxide semiconductor thin-film transistors: A review of recent advances
,”
Adv. Mater.
24
(
22
),
2945
2986
(
2012
).
85.
R. A.
John
,
N.
Tiwari
et al, “
Ultralow power dual-gated subthreshold oxide neuristors: An enabler for higher order neuronal temporal correlations
,”
ACS Nano
12
(
11
),
11263
11273
(
2018
).
86.
R. A.
John
et al, “
Self healable neuromorphic memtransistor elements for decentralized sensory signal processing in robotics
,”
Nat. Commun.
11
(
1
),
4030
(
2020
).
87.
M. R.
Mahmoodi
,
D. B.
Strukov
, and
O.
Kavehei
, “
Experimental demonstrations of security primitives with nonvolatile memories
,”
IEEE Trans. Electron Devices
66
(
12
),
5050
5059
(
2019
).
88.
M.
Baldo
,
O.
Melnic
et al, “
Modeling of virgin state and forming operation in embedded phase change memory (PCM)
,” in
2020 IEEE International Electron Devices Meeting (IEDM)
(
IEEE
,
San Francisco, CA
,
2020
), pp.
13.3.1
13.3.4
.
89.
G.
Pedretti
and
D.
Ielmini
, “
In-memory computing with resistive memory circuits: Status and outlook
,”
Electronics
10
(
9
),
1063
(
2021
).
90.
S.
Wang
,
Z.
Sun
et al, “
Optimization schemes for in-memory linear regression circuit with memristor arrays
,”
IEEE Trans. Circuits Syst., I
68
(
12
),
4900
4909
(
2021
).
91.
O.
Krestinskaya
,
A. P.
James
, and
L. O.
Chua
, “
Neuromemristive circuits for edge computing: A review
,”
IEEE Trans. Neural Networks Learn. Syst.
31
(
1
),
4
23
(
2020
).
92.
L. M.
Vaquero
and
L.
Rodero-Merino
, “
Finding your way in the fog: Towards a comprehensive definition of fog computing
,”
ACM SIGCOMM Comput. Commun. Rev.
44
(
5
),
27
32
(
2014
).
93.
Z.
Sun
,
G.
Pedretti
et al, “
Time complexity of in-memory solution of linear systems
,”
IEEE Trans. Electron Devices
67
(
7
),
2945
2951
(
2020
).
94.
Z.
Sun
and
R.
Huang
, “
Time complexity of in-memory matrix-vector multiplication
,”
IEEE Trans. Circuits Syst., II
68
(
8
),
2785
2789
(
2021
).
95.
G.
Pedretti
,
P.
Mannocci
et al, “
A spiking recurrent neural network with phase-change memory neurons and synapses for the accelerated solution of constraint satisfaction problems
,”
IEEE J. Explor. Solid-State Comput. Devices Circuits
6
(
1
),
89
97
(
2020
).
96.
J.
Borghetti
,
Z.
Li
et al, “
A hybrid nanomemristor/transistor logic circuit capable of self-programming
,”
Proc. Natl. Acad. Sci. U. S. A.
106
(
6
),
1699
1703
(
2009
).
97.
Z.
Sun
,
E.
Ambrosi
et al, “
Logic computing with stateful neural networks of resistive switches
,”
Adv. Mater.
30
(
38
),
1802554
(
2018
).
98.
G. W.
Burr
,
R. M.
Shelby
et al, “
Experimental demonstration and tolerancing of a large-scale neural network (165 000 synapses) using phase-change memory as the synaptic weight element
,”
IEEE Trans. Electron Devices
62
(
11
),
3498
3507
(
2015
).
99.
S.
Ambrogio
,
P.
Narayanan
et al, “
Equivalent-accuracy accelerated neural-network training using analogue memory
,”
Nature
558
(
7708
),
60
67
(
2018
).
100.
M.
Prezioso
,
F.
Merrikh-Bayat
et al, “
Training and operation of an integrated neuromorphic network based on metal-oxide memristors
,”
Nature
521
(
7550
),
61
64
(
2015
).
101.
G.
Pedretti
,
V.
Milo
et al, “
Memristive neural network for on-line learning and tracking with brain-inspired spike timing dependent plasticity
,”
Sci. Rep.
7
(
1
),
5288
(
2017
).
102.
V.
Milo
,
G.
Pedretti
et al, “
Demonstration of hybrid CMOS/RRAM neural networks with spike time/rate-dependent plasticity
,” in
2016 IEEE International Electron Devices Meeting (IEDM)
(
IEEE
,
2016
), pp.
16.8.1
16.8.4
.
103.
M.
Farronato
,
P.
Mannocci
et al, “
Reservoir computing with charge-trap memory based on a MoS2 channel for neuromorphic engineering
,”
Adv. Mater.
(published online) (
2022
).
104.
R.
Carboni
and
D.
Ielmini
, “
Stochastic memory devices for security and computing
,”
Adv. Electron. Mater.
5
(
9
),
1900198
(
2019
).
105.
G.
Indiveri
,
B.
Linares-Barranco
et al, “
Neuromorphic silicon neuron circuits
,”
Front. Neurosci.
5
,
73
(
2011
).
106.
F.
Cai
,
S.
Kumar
et al, “
Power-efficient combinatorial optimization using intrinsic noise in memristor Hopfield neural networks
,”
Nat. Electron.
3
(
7
),
409
418
(
2020
).
107.
S. N.
Truong
and
K.-S.
Min
, “
New memristor-based crossbar array architecture with 50-% area reduction and 48-% power saving for matrix-vector multiplication of analog neuromorphic computing
,”
J. Semicond. Technol. Sci.
14
(
3
),
356
363
(
2014
).
108.
J. J.
Yang
,
D. B.
Strukov
, and
D. R.
Stewart
, “
Memristive devices for computing
,”
Nat. Nanotechnol.
8
(
1
),
13
24
(
2013
).
109.
D.
Kau
,
S.
Tang
et al, “
A stackable cross point Phase Change Memory
,” in
2009 IEEE International Electron Devices Meeting (IEDM)
(
IEEE
,
2009
), pp.
1
4
.
110.
M.-C.
Hsieh
,
Y.-C.
Liao
et al, “
Ultra high density 3D via RRAM in pure 28nm CMOS process
,” in
2013 IEEE International Electron Devices Meeting (IEDM)
(
IEEE
,
Washington, DC
,
2013
), pp.
10.3.1
10.3.4
.
111.
S.
Balatti
,
S.
Ambrogio
et al, “
Set variability and failure induced by complementary switching in bipolar RRAM
,”
IEEE Electron Device Lett.
34
(
7
),
861
863
(
2013
).
112.
S.
Ambrogio
,
S.
Balatti
et al, “
Statistical fluctuations in HfOx resistive-switching memory: Part II—random telegraph noise
,”
IEEE Trans. Electron Devices
61
(
8
),
2920
2927
(
2014
).
113.
S.
Ambrogio
et al, “
Statistical fluctuations in HfOx resistive-switching memory: Part I - set/reset variability
,”
IEEE Trans. Electron Devices
61
(
8
),
2912
2919
(
2014
).
114.
N.
Lepri
,
M.
Baldo
et al, “
Modeling and compensation of IR drop in crosspoint accelerators of neural networks
,”
IEEE Trans. Electron Devices
69
(
3
),
1575
1581
(
2022
).
115.
B.
Govoreanu
,
A.
Redolfi
et al, “
Vacancy-modulated conductive oxide resistive RAM (VMCO-RRAM): An area-scalable switching current, self-compliant, highly nonlinear and wide on/off-window resistive switching cell
,” in
2013 IEEE International Electron Devices Meeting (IEDM)
(
IEEE
,
2013
), pp.
10.2.1
10.2.4
.
116.
F.
Zhang
and
M.
Hu
, “
Mitigate parasitic resistance in resistive crossbar-based convolutional neural networks
,”
ACM J. Emerging Technol. Comput. Syst.
16
(
3
),
1
20
(
2020
).
117.
C.
Mackin
,
M. J.
Rasch
et al, “
Optimised weight programming for analogue memory-based deep neural networks
,”
Nat. Commun.
13
(
1
),
3765
(
2022
).
118.
D.
Joksas
,
E.
Wang
et al, “
Nonideality-aware training for accurate and robust low-power memristive neural networks
,”
Adv. Sci.
9
(
17
),
2105784
(
2022
).
119.
D.
Joksas
,
P.
Freitas
et al, “
Committee machines—a universal method to deal with non-idealities in memristor-based neural networks
,”
Nat. Commun.
11
(
1
),
4273
(
2020
).
120.
M. L.
Gallo
,
S. R.
Nandakumar
et al, “
Precision of bit slicing with in-memory computing based on analog phase-change memory crossbars
,”
Neuromorphic Comput. Eng.
2
(
1
),
014009
(
2022
).
121.
F. L.
Aguirre
,
N. M.
Gomez
et al, “
Minimization of the line resistance impact on memdiode-based simulations of multilayer perceptron arrays applied to pattern recognition
,”
J. Low Power Electron. Appl.
11
(
1
),
9
(
2021
).
122.
N.
Lepri
,
A.
Glukhov
, and
D.
Ielmini
, “
Mitigating read-program variation and IR drop by circuit architecture in RRAM-based neural network accelerators
,” in
2022 IEEE International Reliability Physics Symposium (IRPS)
(
IEEE
,
Dallas, TX
,
2022
), pp.
3C.2–1
3C.2–6
.
123.
A.
Flocke
and
T. G.
Noll
, “
Fundamental analysis of resistive nano-crossbars for the use in hybrid nano/CMOS-memory
,” in
ESSCIRC 2007 - 33rd European Solid-State Circuits Conference
(
IEEE
,
Muenchen, Germany
,
2007
), pp.
328
331
.
124.
F.
Li
,
X.
Yang
et al, “
Evaluation of SiO2 antifuse in a 3D-OTP memory
,”
IEEE Trans. Device Mater. Reliab.
4
(
3
),
416
421
(
2004
).
125.
G. W.
Burr
,
R. S.
Shenoy
et al, “
Access devices for 3D crosspoint memory
,”
J. Vac. Sci. Technol. B
32
(
4
),
040802
(
2014
).
126.
M.-J.
Lee
,
D.
Lee
et al, “
A plasma-treated chalcogenide switch device for stackable scalable 3D nanoscale memory
,”
Nat. Commun.
4
(
1
),
2629
(
2013
).
127.
M.
Hu
,
J. P.
Strachan
et al, “
Dot-product engine for neuromorphic computing: Programming 1T1M crossbar to accelerate matrix-vector multiplication
,” in
Proceedings of the 53rd Annual Design Automation Conference
(
ACM
,
Austin, TX
,
2016
), pp.
1
6
.
128.
D.
Ielmini
,
F.
Nardi
, and
C.
Cagli
, “
Physical models of size-dependent nanofilament formation and rupture in NiO resistive switching memories
,”
Nanotechnology
22
(
25
),
254022
(
2011
).
129.
V.
Milo
,
C.
Zambelli
et al, “
Multilevel HfO2-based RRAM devices for low-power neuromorphic networks
,”
APL Mater.
7
(
8
),
081120
(
2019
).
130.
K.
Gopalakrishnan
,
R. S.
Shenoy
et al, “
Highly-scalable novel access device based on mixed ionic electronic conduction (MIEC) materials for high density phase change memory (PCM) arrays
,” in
2010 Symposium on VLSI Technology
(
IEEE
,
Honolulu
,
2010
), pp.
205
206
.
131.
M.
Son
,
J.
Lee
et al, “
Excellent selector characteristics of nanoscale VO2 for high-density bipolar ReRAM applications
,”
IEEE Electron Device Lett.
32
(
11
),
1579
1581
(
2011
).
132.
V.
Milo
,
A.
Glukhov
et al, “
Accurate program/verify schemes of resistive switching memory (RRAM) for in-memory neural network circuits
,”
IEEE Trans. Electron Devices
68
(
8
),
3832
3837
(
2021
).
133.
Y.-C.
Luo
,
A.
Lu
et al, “
Design of non-volatile capacitive crossbar array for in-memory computing
,” in
2021 IEEE International Memory Workshop (IMW)
(
IEEE
,
2021
), pp.
1
4
.
134.
C.
Li
,
M.
Hu
et al, “
Analogue signal and image processing with large memristor crossbars
,”
Nat. Electron.
1
(
1
),
52
59
(
2018
).
135.
M.
Hu
,
C. E.
Graves
,
C.
Li
,
Y.
Li
,
N.
Ge
,
E.
Montgomery
,
N.
Davila
,
H.
Jiang
,
R. S.
Williams
,
J. J.
Yang
,
Q.
Xia
, and
J. P.
Strachan
,
Adv. Mater.
30
,
1705914
(
2018
).
136.
P.
Narayanan
,
S.
Ambrogio
et al, “
Fully on-chip MAC at 14 nm enabled by accurate row-wise programming of PCM-based weights and parallel vector-transport in duration-format
,”
IEEE Trans. Electron Devices
68
(
12
),
6629
6636
(
2021
).
137.
S. N.
Truong
,
S.
Shin
et al, “
New twin crossbar architecture of binary memristors for low-power image recognition with discrete cosine transform
,”
IEEE Trans. Nanotechnol.
14
(
6
),
1104
1111
(
2015
).
138.
Y.
LeCun
,
Y.
Bengio
, and
G.
Hinton
, “
Deep learning
,”
Nature
521
(
7553
),
436
444
(
2015
).
139.
B.
Fleischer
,
S.
Shukla
et al, “
A scalable multi-TeraOPS deep learning processor core for AI training and inference
,” in
2018 IEEE Symposium on VLSI Circuits
(
IEEE
,
2018
), pp.
35
36
.
140.
H.
Cai
,
Y.
Guo
et al, “
Proposal of analog in-memory computing with magnified tunnel magnetoresistance ratio and universal STT-MRAM cell
,”
IEEE Trans. Circuits Syst., I
69
(
4
),
1519
1531
(
2022
).
141.
J.
Chen
,
S.
Wen
et al, “
Highly parallelized memristive binary neural network
,”
Neural Networks
144
,
565
572
(
2021
).
142.
X.
Sun
,
S.
Yin
et al, “
XNOR-RRAM: A scalable and parallel resistive synaptic architecture for binary neural networks
,” in
2018 Design, Automation and Test in Europe Conference and Exhibition (DATE)
(
IEEE
,
Dresden, Germany
,
2018
), pp.
1423
1428
.
143.
W.
Wan
,
R.
Kubendran
et al, “
A compute-in-memory chip based on resistive random-access memory
,”
Nature
608
(
7923
),
504
512
(
2022
).
144.
C.
Li
,
D.
Belkin
et al, “
Efficient and self-adaptive in-situ learning in multilayer memristor neural networks
,”
Nat. Commun.
9
(
1
),
2385
(
2018
).
145.
M. R.
Mahmoodi
,
H.
Kim
et al, “
An analog neuro-optimizer with adaptable annealing based on 64×64 0T1R crossbar circuit
,” in
2019 IEEE International Electron Devices Meeting (IEDM)
(
IEEE
,
2019
), pp.
14.7.1
14.7.4
.
146.
L.
Deng
,
L.
Liang
et al, “
SemiMap: A semi-folded convolution mapping for speed-overhead balance on crossbars
,”
IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst.
39
(
1
),
117
130
(
2020
).
147.
J.-s.
Seo
,
J.
Saikia
et al, “
Digital versus analog artificial intelligence accelerators: Advances, trends, and emerging designs
,”
IEEE Solid-State Circuits Mag.
14
(
3
),
65
79
(
2022
).
148.
W.
Maass
, “
Noise as a resource for computation and learning in networks of spiking neurons
,”
Proc. IEEE
102
(
5
),
860
880
(
2014
).
149.
J. J.
Hopfield
, “
Neural networks and physical systems with emergent collective computational abilities
,”
Proc. Natl. Acad. Sci. U. S. A.
79
(
8
),
2554
2558
(
1982
).
150.
C. D.
Wright
,
P.
Hosseini
, and
J. A. V.
Diosdado
, “
Beyond von-Neumann computing with nanoscale phase-change memory devices
,”
Adv. Funct. Mater.
23
(
18
),
2248
2254
(
2013
).
151.
M. R.
Mahmoodi
,
M.
Prezioso
, and
D. B.
Strukov
, “
Versatile stochastic dot product circuits based on nonvolatile memories for high performance neurocomputing and neurooptimization
,”
Nat. Commun.
10
(
1
),
5113
(
2019
).
152.
M. N.
Bojnordi
and
E.
Ipek
, “
Memristive Boltzmann machine: A hardware accelerator for combinatorial optimization and deep learning
,” in
2016 IEEE International Symposium on High Performance Computer Architecture (HPCA)
(
IEEE
,
Barcelona, Spain
,
2016
), pp.
1
13
.
153.
T.
Dalgaty
,
E.
Esmanhotto
et al, “
Ex situ transfer of Bayesian neural networks to resistive memory-based inference hardware
,”
Adv. Intell. Syst.
3
(
8
),
2000103
(
2021
).
154.
T.
Dalgaty
,
N.
Castellani
et al, “
In situ learning using intrinsic memristor variability via Markov chain Monte Carlo sampling
,”
Nat. Electron.
4
(
2
),
151
161
(
2021
).
155.
Z.
Sun
,
G.
Pedretti
et al, “
Solving matrix equations in one step with cross-point resistive arrays
,”
Proc. Natl. Acad. Sci. U. S. A.
116
(
10
),
4123
4128
(
2019
).
156.
Z.
Sun
,
E.
Ambrosi
et al, “
In-memory PageRank accelerator with a cross-point array of resistive memories
,”
IEEE Trans. Electron Devices
67
(
4
),
1466
1470
(
2020
).
157.
Z.
Sun
,
G.
Pedretti
et al, “
One-step regression and classification with cross-point resistive memory arrays
,”
Sci. Adv.
6
(
5
),
eaay2378
(
2020
).
158.
P.
Mannocci
,
G.
Pedretti
et al, “
A universal, analog, in-memory computing primitive for linear algebra using memristors
,”
IEEE Trans. Circuits Syst., I
68
,
4889
(
2021
).
159.
G.
Zoppo
,
A.
Korkmaz
et al, “
Analog solutions of discrete Markov chains via memristor crossbars
,”
IEEE Trans. Circuits Syst., I
68
(
12
),
4910
4923
(
2021
).
160.
Z.
Sun
,
G.
Pedretti
et al, “
In-memory eigenvector computation in time O(1)
,”
Adv. Intell. Syst.
2
(
8
),
2000042
(
2020
).
161.
P.
Gupta
,
A.
Goel
et al, “
WTF: The who to follow service at twitter
,” in
Proceedings of the 22nd international conference on World Wide Web - WWW’13
(
ACM Press
,
Rio de Janeiro, Brazil
,
2013
), pp.
505
514
.
162.
G.
Pedretti
,
P.
Mannocci
et al, “
Redundancy and analog slicing for precise in-memory machine learning—Part II: Applications and benchmark
,”
IEEE Trans. Electron Devices
68
(
9
),
4379
4383
(
2021
).
163.
P.
Mannocci
,
E.
Melacarne
, and
D.
Ielmini
, “
An analogue in-memory ridge regression circuit with application to massive MIMO acceleration
,”
IEEE J. Emerging Sel. Top. Circuits Systems
12
,
952
(
2022
).
164.
B.
Feinberg
,
R.
Wong
et al, “
An analog preconditioner for solving linear systems
,” in
2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA)
(
IEEE
,
Seoul, South Korea
,
2021
), pp.
761
774
.
165.
K.
Pagiamtzis
and
A.
Sheikholeslami
, “
Content-addressable memory (CAM) circuits and architectures: A tutorial and survey
,”
IEEE J. Solid-State Circuits
41
(
3
),
712
727
(
2006
).
166.
I.
Sourdis
and
D.
Pnevmatikatos
, “
Pre-decoded CAMs for efficient and high-speed NIDS pattern matching
,” in
12th Annual IEEE Symposium on Field-Programmable Custom Computing Machines
(
IEEE
,
Napa, CA
,
2004
), pp.
258
267
.
167.
C. E.
Graves
,
C.
Li
et al, “
In-memory computing with memristor content addressable memories for pattern matching
,”
Adv. Mater.
32
(
37
),
2003437
(
2020
).
168.
R.
Karam
,
R.
Puri
et al, “
Emerging trends in design and applications of memory-based computing and content-addressable memories
,”
Proc. IEEE
103
(
8
),
1311
1330
(
2015
).
169.
A. J.
McAuley
and
P.
Francis
, “
Fast routing table lookup using CAMs
,” in
Proceedings of the IEEE INFOCOM’93 The Conference on Computer Communications
(
IEEE Computer Society Press
,
San Francisco, CA
,
1993
), pp.
1382
1391
.
170.
C.
Li
,
C. E.
Graves
et al, “
Analog content-addressable memories with memristors
,”
Nat. Commun.
11
(
1
),
1638
(
2020
).
171.
G.
Pedretti
,
C. E.
Graves
et al, “
Tree-based machine learning performed in-memory with memristive analog CAM
,”
Nat. Commun.
12
(
1
),
5806
(
2021
).
172.
H.
Tsai
,
S.
Ambrogio
et al, “
Recent progress in analog memory-based accelerators for deep learning
,”
J. Phys. D: Appl. Phys.
51
(
28
),
283001
(
2018
).
173.
S.
Agarwal
,
S. J.
Plimpton
et al, “
Resistive memory device requirements for a neural algorithm accelerator
,” in
2016 International Joint Conference on Neural Networks (IJCNN)
(
IEEE
,
Vancouver, BC
,
2016
), pp.
929
938
.
174.
S.
Yu
,
P.-Y.
Chen
et al, “
Scaling-up resistive synaptic arrays for neuro-inspired architecture: Challenges and prospect
,” in
2015 IEEE International Electron Devices Meeting (IEDM)
(
IEEE
,
Washington, DC
,
2015
), pp.
17.3.1
17.3.4
.
175.
M.
Farronato
,
M.
Melegari
et al, “
Low-current, highly linear synaptic memory device based on MoS2 transistors for online training and inference
,” in
2022 IEEE 4th International Conference on Artificial Intelligence Circuits and Systems (AICAS)
(
IEEE
,
Incheon, Republic of Korea
,
2022
), pp.
1
4
.
176.
J.
Woo
,
K.
Moon
et al, “
Improved synaptic behavior under identical pulses using AlOx/HfO2 bilayer RRAM array for neuromorphic systems
,”
IEEE Electron Device Lett.
37
(
8
),
994
997
(
2016
).
177.
S. H.
Jo
,
T.
Chang
et al, “
Nanoscale memristor device as synapse in neuromorphic systems
,”
Nano Lett.
10
(
4
),
1297
1301
(
2010
).
178.
J.-W.
Jang
,
S.
Park
et al, “
Optimization of conductance change in Pr1−xCax MnO3-based synaptic devices for neuromorphic systems
,”
IEEE Electron Device Lett.
36
(
5
),
457
459
(
2015
).
179.
S.
Hao
,
X.
Ji
et al, “
Monolayer MoS2/WO3 heterostructures with sulfur anion reservoirs as electronic synapses for neuromorphic computing
,”
ACS Appl. Nano Mater.
4
(
2
),
1766
1775
(
2021
).
180.
C. J.
McClellan
,
A. C.
Yu
et al, “
Vertical sidewall MoS2 growth and transistors
,” in
2019 Device Research Conference (DRC)
(
IEEE
,
Ann Arbor, MI
,
2019
), pp.
65
66
.
181.
C.
Mead
, “
Neuromorphic electronic systems
,”
Proc. IEEE
78
(
10
),
1629
1636
(
1990
).
182.
G.
Indiveri
,
F.
Corradi
, and
N.
Qiao
, “
Neuromorphic architectures for spiking deep neural networks
,” in
2015 IEEE International Electron Devices Meeting (IEDM)
(
IEEE
,
Washington, DC
,
2015
), pp.
4.2.1
4.2.4
.
183.
Y.
Taigman
,