I. INTRODUCTION TO THE ROADMAP
Adnan Mehonic, Daniele Ielmini, and Kaushik Roy
A. Taxonomy and motivation
The growing adoption of data-driven applications, such as artificial intelligence (AI), is transforming the way we interact with technology. Currently, the deployment of AI and machine learning tools in previously uncharted domains generates considerable enthusiasm for further research, development, and utilization. These innovative applications often provide effective solutions to complex, longstanding challenges that have remained unresolved for years. By expanding the reach of AI and machine learning, we unlock new possibilities and facilitate advancements in various sectors. These include, but are not limited to, scientific research, education, transportation, smart city planning, eHealth, and the metaverse.
However, our predominant focus on performance can sometimes lead to critical oversights. For instance, our constant dependence on immediate access to information might cause us to ignore the energy consumption and environmental consequences associated with the computing systems that enable such access. Balancing performance with sustainability is crucial for the technology’s continued growth.
From this standpoint, the environmental impact of AI is a cause for growing concern. In addition, applications such as the Internet of Things (IoT) and autonomous robotic agents may not always rely on resource-intensive deep learning algorithms but still need to minimize energy consumption. Realizing the vision of IoT is contingent upon reducing the energy requirements of numerous connected devices. The demand for computing power is growing at a rate that far exceeds improvements achieved through Moore’s law scaling. Figure 1(a) shows the computing power demands, quantified in peta-floating-point operations (petaflops, one peta = 1015) per day, as a function of time, indicating an increase of a factor 2 every two months in recent years.1 In addition to Moore’s law, significant advancements have been made through the combination of intelligent architecture and hardware–software co-design. For instance, NVIDIA graphics processing units (GPUs’) performance has improved by a factor of 317 from 2012 to 2021, surpassing expectations based on Moore’s law alone. Research and development efforts have demonstrated further impressive performance improvements,2–4 suggesting that more can be achieved. However, conventional computing solutions alone are unlikely to meet the demand in the long term, particularly when considering the high costs of training associated with the most complex deep learning models [Fig. 1(b)]. It is essential to explore alternative approaches to tackle these challenges and ensure the long-term sustainability of AI’s rapid advancements. While global energy consumption is crucial and important, there is a relevant issue that is perhaps just as significant: the ability of low-power systems to execute complex AI algorithms without relying on cloud-based computing. It is important to keep in mind that the challenge of global AI power consumption and the ability to implement complex AI on low-power systems are two somewhat separate challenges. It might be the case that these two challenges need to be addressed with somewhat different strategies (e.g., the power consumption in data centers for the most complex, largest AI models, such as large language models, might be addressed differently than implementing mid-sized AI models, such as voice recognition, on low-power, self-contained systems that might need to run at a few milliwatts of power). The latter strategy might not be scalable for the largest models, or the optimization of the largest models might not be applicable for simpler models running on much lower power budgets. However, undeniably, for both, we need to improve the overall energy efficiency of our computing systems that are designed to execute AI workloads.
The energy efficiency and performance of computing can largely benefit from new paradigms that aim at replicating or being inspired by specific characteristics of the brain’s biological mechanisms. It is important to note that biological systems might be highly specialized and heterogeneous, and therefore, different tasks are addressed by different computational schemes. However, we can still aim to take inspiration from general features when they are advantageous for specific applications. It is unlikely that a single architecture or broader approach will be best applicable for all targeted applications.
Adopting an interdisciplinary methodology, experts in materials science, device and circuit engineering, system design, and algorithm and software development are brought together to collectively contribute to the progressive field of neuromorphic engineering and computing. This collaborative approach is instrumental in fueling innovation and promoting advancements in a domain that seeks to bridge the gap between biological systems and artificial intelligence. Coined by Mead in the late 1980s,5 the term “neuromorphic” originally referred to systems and devices replicating certain aspects of biological neural systems, but now, it varies across different research communities. While the term’s meaning continues to evolve, it generally refers to a system embodying brain-inspired properties, such as in-memory computing, hardware learning, spike-based processing, fine-grained parallelism, and reduced precision computing. One can also draw analogies and identify more complex phenomenological similarities between biological units (e.g., neurons) and electronic components (e.g., memristors). For example, phenomenological similarities between models of the redox-based nanoionic resistive memory cell and common neuronal models, such as the Hodgkin–Huxley conductance model and the leaky integrate-and-fire model, have been demonstrated.6 Even more complex biological functionalities have been demonstrated using a single third-order nanocircuit elements.7 It should be noted that many paradigms related to the neuromorphic approach have also been independently investigated. For instance, in-memory computing,8 while being a cornerstone of the neuromorphic paradigm, is also examined separately. It represents one of the most promising avenues to enhance the energy efficiency of AI hardware or more general computing, offering a break from the traditional von Neumann architecture paradigm.
Neuromorphic research can be divided into three areas. First, “neuromorphic engineering” employs either complementary metal–oxide–semiconductor (CMOS) technology (e.g., transistors working in a sub-threshold regime) or cutting-edge post-CMOS technologies to reproduce the brain’s computational units and mechanisms. Second, “neuromorphic computing” explores new data processing methods, frequently drawing inspiration from biological systems and considering alternative algorithms, such as spike-based computing. Finally, the development of “neuromorphic devices” marks the third field. Taking advantage of advancements in electronic and photonic technologies, it develops innovative nano-devices that frequently emulate biological components, such as neurons and synapses, or efficiently implement desired properties, such as in-memory computing.
Furthermore, various approaches to neuromorphic research can be identified based on their primary objectives. Some systems focus on delivering efficient hardware platforms to enhance our understanding of biological nervous systems, while others employ brain-inspired principles to create innovative, efficient computing applications. This roadmap primarily focuses on the latter. While there are already outstanding roadmaps,9 reviews,10–12 and special issues13 that offer comprehensive overviews of neuromorphic technologies, encompassing the integration of hardware and software solutions as well as the exploration of new learning paradigms, this particular roadmap focuses on emphasizing the significance of materials engineering in advancing cutting-edge complementary metal–oxide–semiconductor (CMOS) and post-CMOS technologies. Simultaneously, it offers a holistic perspective on the general challenges of computing systems, the reasoning behind adopting the neuromorphic approach, and concise summaries of current technologies to better contextualize the role of materials engineering within the broader neuromorphic landscape. Of course, there are other critical aspects in the development of neuromorphic technologies that need to be taken into account. For example, an excellent recent review on thermal management materials, devices, and networks is one such example.14
This roadmap is organized into several thematic sections, outlining current computing challenges, discussing the neuromorphic computing approach, analyzing mature and currently utilized technologies, providing an overview of emerging technologies, addressing material challenges, exploring novel computing concepts, and finally examining the maturity level of emerging technologies while determining the next essential steps for their advancement.
This roadmap starts with a concise introduction to the current digital computing landscape, primarily characterized by Moore’s law scaling and the von Neumann architecture. It then explores the challenges in sustaining Moore’s law and examines the significance and potential advantages of post-CMOS technologies and architectures aiming to integrate computing and memory. Following this, this roadmap presents a historical perspective on the neuromorphic approach, emphasizing its potential benefits and applications. It provides a thorough review of cutting-edge developments in various emerging technologies, comparing them critically. The discussion addresses how these technologies can be utilized to develop computational building blocks for future computing systems. The roles of two mature technologies, static random access memory (SRAM) and flash, are also explored. The overview of emerging technologies includes resistive switching and memristors, phase change materials, ferroelectric materials, magnetic materials, spintronic materials, optoelectronic and photonic materials, and 2D devices and systems. Material challenges are discussed in detail, covering types of challenges, possible solutions, and experimental techniques to study these. Novel computing concepts are examined, focusing on embracing device and system variability, spiking-based computing systems, analog computing for linear algebra, and the use of analog content addressable memory (CAM) for in-memory computing and optimization solvers. Section VIII discusses technological maturity and potential future directions.
II. COMPUTING CHALLENGES
Onur Mutlu and Shahar Kvatinsky
A. Digital computing
1. Status
Digital computing has a long and complex history that stretches back over a century. The earliest electronic computers were developed in the 1930s and 1940s, and they were large, expensive, and difficult to use. However, these early computers laid the foundation for the development of the modern computers that we use today and their principles are still in widespread use.
One of the key figures in the early history of digital computing was John von Neumann, a mathematician and computer scientist known for his contributions to the field of computer science. Von Neumann advocated the stored program concept and sequential instruction processing, two vital features of the von Neumann architecture15 that are still used in most computers today. Another key feature of the von Neumann architecture is the separation of the central processing unit (CPU) and the main memory. This separation allows the CPU to access the instructions and data it needs from the main memory while executing a program, and assigns the computation and control responsibilities specifically to the CPU.
Throughout the years, the rapid scaling of semiconductor logic technology, known as Moore’s law,16 has led to tremendous improvements in computer performance and energy efficiency. With the exponential increase in the number of transistors placed on a single chip provided by technology scaling, engineers have explored many ways to increase the speed and performance of computers. One way they did this was by exploiting parallelism, which is the ability of a computer to perform multiple tasks simultaneously. There are several different types of parallelism, including SISD (single instruction, single data), SIMD (single instruction, multiple data), MIMD (multiple instruction, multiple data), and MISD (multiple instruction, single data),17 all of which are exploited in modern computing systems ranging from general-purpose single-core and multi-core processors, GPUs, and specialized accelerators.
Technology scaling has also allowed for the development of more processing units, starting from duplicating the processing cores and, more recently, adding accelerators. These accelerators can off-load specific tasks, e.g., video processing, compression/decompression, vision processing, graphics, and machine learning, from the central processor, further improving performance and energy efficiency (the required energy to perform a certain task) by specializing the computation units to the task at hand. As such, modern systems are heterogeneous, with many different types of logic-based computation units integrated into the same processor die.
2. Challenges
While the performance and energy of logic-based computation units have scaled very well via technology scaling, those of interconnect and memory systems have not scaled as well. As a result, communication (e.g., data movement) between computation units and memory units has emerged as a major bottleneck, partly due to the disparity in scaling and partly due to the separation and disparity between processing and memory offered in von Neumann architecture, which both have limited the ability of computers to take full advantage of the improvements in logic technology. This bottleneck is broadly referred to as the “von Neumann bottleneck” or the “memory wall,” as it can greatly limit the speed and energy at which the computer can execute instructions.
For decades, the transistor size has scaled down, while the power density has remained constant. This phenomenon, first observed in the 1970s by Dennard,18 means that as transistors become smaller and more densely packed onto a chip, the overall performance and capabilities of the chip improve. However, since the early 2000s, it has become increasingly challenging to maintain Dennard scaling as voltage (and thus frequency) scaling has greatly slowed down. The end of Dennard scaling has increased the importance of energy efficiency of different processing units and led to phenomena such as “dark silicon,”19 where large parts of the chip are powered off. The rapid move toward more specialized processing units, powered on for specific tasks, exemplifies the influence of the end of Dennard scaling.
Furthermore, in recent years, it has become increasingly challenging to maintain the pace of Moore’s law due to the physical limitations of transistors and the challenges of manufacturing smaller and more densely packed chips. As a result, the looming end of Moore’s law has been a topic of discussion in the tech industry, as this could potentially limit the future performance improvements of computer chips. New semiconductor technologies and novel architectural solutions are required to continue computing systems’ performance and energy efficiency improvements at a similar pace as in the past.
3. Potential solutions and conclusion
In recent years, different semiconductor and manufacturing technologies have emerged to overcome the slowdown of Moore’s law. These devices include new transistor structures and materials, advanced packaging techniques, and new (e.g., non-volatile) memory devices. Some of those technologies have similar functionality as standard CMOS technology but with improved properties. Other technologies also offer radically new properties, different from CMOS. For example, memristive technologies, such as resistive RAM,20 have varying resistance and provide analog data storage that also supports computation. Such novel technologies with their unique properties may serve as enablers for new architectures and computing paradigms, which could be different from and complementary to the von Neumann architecture.
The combination of Moore’s law slowdown and von Neumann’s bottleneck requires fresh thinking on computing paradigms. Data movement between the memory and the processing units is the primary impediment against high performance and high energy efficiency in modern computing systems.21–24 In addition, this impediment only worsens with the improved processing abilities and the increased need for data. All modern computers employ a variety of methods to mitigate the memory bottleneck, all of which increase the complexity and power requirements of the system with limited (and sometimes little) success in mitigating the bottleneck. For example, modern computers have several levels of cache memories to reduce the latency and power of memory accesses by exploiting data locality. Cache memories, however, have limited capacity and are effective only when significant spatial and temporal locality exists in the program. Cache memories are not always (completely) effective due to low locality in many modern workloads, which can worsen the performance and energy efficiency of computers.25,26 Similarly, modern computers employ prefetching techniques across the memory hierarchy to anticipate future memory accesses and load data into caches before they are needed by the processor. While partially effective for relatively simple memory access patterns, prefetching is not effective for complicated memory access patterns and it increases system complexity and memory bandwidth consumption.27 Thus, memory bottleneck remains a tough challenge and hundreds of research papers and patents are written every year to mitigate it.28
Overcoming the performance and energy costs of off-chip memory accesses is an increasingly difficult task as the disparity between the efficiency of computation and the efficiency of memory access continues to grow. There is therefore a need to examine more disruptive technologies and architectures that much more tightly integrate logic and memory at a large scale, avoiding the large costs of data movement across system components.
Many efforts to move computation closer to and inside the memory units have been made,29 including adding processing units in the same package as DRAM chips,30,31 performing digital processing using memory cells,32,33 and using analog computation capabilities of both DRAM and non-volatile memory (NVM) devices.34–37 One exciting novel computing paradigm to eliminate the von Neumann bottleneck is to reconsider the way computation and memory tasks are performed by getting inspiration from the brain, where, unlike von Neumann architecture, processing and storage are not separated. Many recent studies demonstrate orders of magnitude performance and energy improvements using various kinds of processing-in-memory architectures.29 Processing-in-memory and, more broadly, neuromorphic (or brain-inspired) computing thus offers a promising way to overcome the major performance and energy bottleneck in modern memory systems. However, it also introduces significant challenges for adoption as it is a disruptive technology that affects all levels of the system stack, from hardware devices to software algorithms.
III. NEUROMORPHIC COMPUTING BASICS AND ITS EVOLUTION
Teresa Serrano-Gotarredona and Bernabe Linares Barranco
A. What is neuromorphic computing/engineering
Neuromorphic computing can be defined as the underlying computations performed by neuromorphic physical systems. Neuromorphic physical systems carry out robust and efficient neural computation using hardware implementations that operate in physical time. Typically, they are event- or data-driven and they employ low-power, massively parallel hybrid analog, digital, or mixed VLSI circuits, and they operate using similar physics of computation used by the nervous system.
Spiking neural networks (SNNs) are one very good example of a neuromorphic computing system. Computation is performed whenever a spike is transmitted and received by destination neurons. Computation can be performed at the dendritic tree, while spikes travel to their destinations, as well as at the destination neurons where they are collected to update the internal states of the neurons. Neurons collect pre-weighted and pre-filtered spikes coming from different source neurons or sensors, perform some basic computation on them, and generate an output spike whenever their internal state reaches some threshold. A neuron firing typically means that the “feature” this neuron represents has been identified in place and time. The collective computation of populations of neurons can give rise to powerful system level behaviors, such as pattern recognition, decision making, sensory fusion, and knowledge abstraction. In addition, neuromorphic computing systems can also be enabled to acquire new knowledge through both supervised and unsupervised learning, either offline or while they perform, which is typically known as online learning and which can be life-long. Neuromorphic computing covers typically from sensing to processing to learning.
1. Neuromorphic sensing
Probably the most clarifying example of what neuromorphic computation is about is the paradigm of neuromorphic visual computation. Neuromorphic visual computation exploits the data encoding provided by neuromorphic visual sensors. At present, the most widespread neuromorphic vision sensor is the Dynamic Vision Sensor (DVS).38 In a DVS, each pixel sends out its (x, y) coordinate whenever its photodiode perceives a relative change of light beyond some preset thresholds , with θ+ slightly greater than 1 and θ− slightly less than 1. This is typically referred to as an “address event.” If In+1 > In, then light has increased. If In+1 < In, then light has decreased. To differentiate both situations, the address event can also be a signed event, by adding a sign bit “s,” (x, y, s). If events are recorded using some event-recording hardware, then a timestamp tn is added to each event (xn, yn, sn, tn). The full recording consists then of a list of timestamped address events. Figure 2 illustrates this. In Fig. 2(a), a DVS camera is observing a 7 kHz spiral on a classic phosphor oscilloscope (without any extra illumination source). Figure 2(b) plots in {x, y, t} space the recorded events. The camera was a 128 × 128 pixel high-contrast sensitivity DVS camera.39 Therefore, x–y coordinates in Fig. 2(b) span from 0 to 127. The vertical axis is time, which spans over about 400 μs, slightly less than 100 μs per spiral turn. Each dot in Fig. 2(b) is an address event, and we can count several hundreds of them within the 400 μs. This DVS camera is capable of generating over 10 × 106 events per second (about one every 100 ns). This produces a very fine timing resolution when sensing dynamic scenes.
The information (events) produced by this type of sensors can be sent directly to event-driven neuromorphic computing hardware, which would process this quasi-instantaneous dynamic visual information event by event.
2. Neuromorphic processing
Neuromorphic signal information encoding in the form of sequences of events reduces information so that only meaningful data, such as changes, are transmitted and processed. This follows the underlying principle in biological nervous systems, as information transmission (in the form of nervous spikes) and their consequent processing affect energy consumption. Thus, biological systems tend to minimize the number of spikes (events) to be transmitted and processed for a given computational task. This principle is what neuromorphic computing intends to pursue. Figure 3 shows an illustrative example of this efficient frame-free event-driven information encoding.46 In Fig. 3(a), we see a poker card deck being browsed at a natural speed, recorded with a DVS, and played back at real-time speed with a reconstructed frame time of about 20 ms. In Fig. 3(b), the same recorded list of events is played back at 77 μs frame time. In Fig. 3(c), we show the tracked symbol input fed to a spiking convolutional neural network for object recognition, displaying the recognized output symbol. In Fig. 3(d), we show the 4-layer spiking convent structure, and in Fig. 3(e), we show the {x, y, time} representation of 20 ms input and output events occurring during a change of card so that the recognition switches from one symbol to the next in less than 2 ms. Note that here the system is composed of both, the sensor and the network executing the recognition. Both working together need less than 2 ms. This contrasts dramatically with conventional artificial systems, in which the sensor first needs to acquire two consecutive images (typically 25 ms per image) and then process both to capture the change.
Figure 3 illustrates a simple version of a neuromorphic sensing and processing system. By today, much larger neuromorphic systems, inspired in the same information encoding scheme, have been developed and demonstrated. The following are some powerful example systems:
The SpiNNaker platform47 was partly developed under the Human Brain Project.48 It features an 18-core Advanced RISC Machine (ARM) SpiNNaker chip.Each node on the platform comprises a printed circuit board (PCB) that holds 48 of these chips. In total, about 1200 chips are assembled into furniture-like sets that include racks. Collectively, these setups host approximately 1 million ARM cores. This system is capable of emulating 1 × 109 neurons in real time. An updated SpiNNaker chip has already been developed, performing about 10× in efficiency, neuron emulation capability, and event traffic handling, while keeping similar power consumption.
The BrainScales platform,49 also developed during the Human Brain Project,47 implements physical silicon neurons fabricated on full silicon 8 in. wafers, interconnecting 20 of these wafers in a cabinet, together with 48 FPGA based communication modules. It implements accelerated time computations with respect to real time (about 10 000×), with spike-timing-dependent plastic synapses. Each wafer can host about 200k neurons and 44 × 106 synapses.
The IBM TrueNorth chip50 could host 1 × 106 very simple neurons, or be reconfigured to achieve a trade-off between the number of neurons and neuron model complexity. They were structured into 4096 identical cores, consuming about 63 mW each.
Loihi from Intel is probably by today the most advanced neuromorphic chip. In its first version,51 fabricated in 14 nm, it contains 128 cores, each capable of implementing 1k spiking neuronal units (compartments), and includes plastic synapses. More recently, Loihi 2 chip was introduced, with up to 1 × 106 neurons per chip, manufactured in Intel 4 technology (7 nm). Up to 768 of Loihi chips have been assembled into the Pohoiki Springs system, while operating at less than 500 W.52
3. Challenges and conclusion
Neuromorphic computing algorithms should be optimum when run on neuromorphic hardware, where events travel and are processed in a fully parallel manner. One of the main challenges in present day neuromorphic computing is to train and execute powerful computing systems directly on neuromorphic hardware. Traditionally, neuromorphic computing problems were mapped to more traditional deep neural networks to obtain their parameters through backpropagation based training,53 which would then be mapped to their neuromorphic/spiking counterpart.46 However, these transformations always resulted in a loss of performance. By today, there are many proposals of training directly in the spiking domain, combining variants of spike-timing-dependent plasticity rules, with surrogate training techniques that adapt backpropagation to spiking systems, which are tested on either fully connected or convolution based deep spiking neural networks. For an updated review, readers are referred to Ref. 54.
On the other hand, it remains to see whether novel nanomaterial devices, such as memristors, can provide truly giga-scale compact chips with billions of neurons on a single chip and self-learning algorithms. Some initial demonstrations of single55 or multi-core systems56 exploiting a nano-scale memristor combined with a selector transistor as a synaptic element have been reported, with highly promising outlooks once synapse elements could be provided as pure nanoscale devices while stacking multiple layers of synapse fabrics together with other nano-scale neurons.57 In the end, the success of neuromorphic computing will rely on the optimum combined progress in neuromorphic hardware, most probably exploiting emerging nano-scale devices massively, in event- and data-driven information and energy-efficient processing methodologies, and finally in providing efficient, resilient, and quick learning methodologies for mapping real-world applications into the available hardware and computational neuromorphic substrates.
B. Different neuromorphic technologies and state of the art
Sabina Spiga
1. Status
The research field of neuromorphic computing has been growing significantly over the past three decades, following the pioneering research at Caltech (USA) by Mead,5 and it is currently attracting the interest of a wide and interdisciplinary community from devices, circuits, and systems to neuroscience, biology, computer science, materials, and physics. Within this framework, the developed neuromorphic hardware technologies span from fully CMOS-based systems58,59 to solutions exploiting the use of charge-based or resistive non-volatile memory technologies60–62 and to emerging memristive device concepts and novel materials.63–66 Figure 4 reports a schematic (and non-exhaustive) evolution of the main technologies of interest. A common feature of these approaches is to take inspiration from the brain computation, by co-locating memory and processing [in-memory computing (IMC) approach], to overcome the von Neumann bottleneck. Hardware artificial neural networks (ANNs) can implement IMC computing and provides an efficient physical substrate for machine learning algorithms and artificial intelligence (AI). On the other hand, spiking neural networks (SNNs), encoding and processing information using spikes, hold great promise for applications requiring always-on real-time processing of sensory signals, for example in edge computing, personalized medicine, and Internet of things.
In terms of the maturity of neuromorphic technologies, we can discuss three main blocks.
Current large-scale hardware neuromorphic computing systems are fully CMOS-based and exploit digital or analog/mixed-signal technologies. Examples of fabricated chips are the IBM TrueNorth, Intel Loihi, Tianjic, ODIN, and others as discussed in previous review papers.58,59 In these systems, the neuron and synapse functionalities are emulated by using circuit blocks based on CMOS transistors, capacitors, and volatile SRAM memory. The scientific community is now exploiting these chips to implement novel algorithms for AI applications.
Non-volatile memory technologies. In the past decade, resistive non-volatile memory (NVM) technologies, such as Resistive Random Access Memory (RRAM), phase change memory (PCM), ferroelectric memory (FeRAM) and ferroelectric transistor (FeFET), and magnetoresistive random access memory (MRAM), have been proposed as possible compact, low power, and dynamical elements to implement in hardware the synaptic nodes, replacing SRAMs, or as a key element of neuronal blocks.60,61,67 While these NVMs have been developed over the past twenty years mainly for data storage applications, and introduced in the market, they can be considered emerging technologies in the field of neuromorphic computing and their great potential is still not fully exploited. Over the past 10 years, novel concepts for computing, based on hybrid CMOS/non-volatile resistive memory circuits and chips,56 have been proposed in the literature. In parallel, also more conventional charge-based non-volatile memories, such as flash and NRAM, are currently being investigated for IMC since they are mature technologies. Finally, it is worth mentioning the emerging memory technologies that are attracting increasing interest in the field of IMC and neuromorphic computing, namely the ferroelectric tunnel junction (FTJ)68 and the 3-terminal electrochemical random access memory (ECRAM).69
Advanced memristive materials, devices, and novel computation concepts that are currently investigated include 2D materials, organic materials, perovskites, nanotubes, self-assembled nano-objects and nanowire networks, advanced device concepts in the field of spintronics (domain wall, race-trace memory, and skyrmions), devices based on metal–insulator transition (for instance, VO2-based devices), and volatile memristors.65,66,70–72 These technologies are currently proofs of concept at a single device level and circuit blocks connecting a reduced number of devices. The computing system is sometimes demonstrated with a mixed hardware/software approach, where the measured device characteristics are used to simulate large systems. Finally, it is worth mentioning the increasing interest in architectures that can exploit photonic components for computing, toward the building of neuromorphic photonic processors taking advantage of the silicon photonic platforms and co-integration with novel optical memory devices and advanced materials, such as phase-change materials.73,74
Figure 5 schematically shows examples of the material systems currently most investigated in various approaches and technologies for neuromorphic computing.
2. Challenges
The current and future challenges can be considered at various levels.
For large-scale neuromorphic processors, the progress of CMOS-based technologies and their scaling still provide room to advance the research field. The main challenges are at the architecture and algorithm level. On the other hand, most NVM memories (RRAM, PCM, FeRAM, FeFET, and MRAM) have been already integrated with CMOS at scaled technological nodes and large integration density and hold interesting properties (depending on the specific technology), such as small size, scalability, possible easy integration also in 3D array stacking, low programming energy, and multilevel programming capability. Therefore, it is expected that NVM technologies will play an increasing role in future IMC chips or neuromorphic processors, by enabling energy-efficient computation. Prototype IMC chips have been reported in the literature,56,75 as well as innovative circuits for SNNs implementing advanced learning rules to compute with dynamics.76,77
On the other hand, it is worth mentioning that the NVM technologies exhibit several device-level non-idealities, as discussed in more detail in Sec. V of this roadmap. As examples and a non-exhaustive list, we can mention nonlinearity and stochasticity in conductance update vs a number of pulses at a fixed voltage (PCM, RRAM, and FeRAM), asymmetry (RRAM) in the bidirectional tuning of conductance, conductance drift (PCM) or broadening of the resistance distribution (RRAM) after programming, device-to-device and cycle-to-cycle variability of the programmed states, low resolution due to the limited number of programmable levels (up to 8 or 16 are demonstrated for RRAM and PCM at the array level), restricted memory window (MRAM) or limited endurance (general issue except for MRAM), and relatively high conduction also in the OFF state. All these aspects can impact the neural network accuracy and reliability, although proper algorithms/architectures can take advantage from stochasticity or asymmetry of conductance tuning.78 Therefore, a careful co-design of hardware and algorithms is required together with an improvement of circuit design and/or programming device strategies to fully exploit NVMs in combination with CMOS and in large systems. The specific challenges and possible specific applications of the listed technologies will be discussed further in Secs. V A–V D of this roadmap, while a more deep view on application scenario is reported in Sec. VII.
Regarding the plethora of emerging materials/devices and novel concepts proposed for neuromorphic computing (beyond the ones discussed in the previous point, see some examples in Secs. V D–V F of this roadmap), the main challenge is that they are mostly demonstrated at the single device level or in early stage proofs of concept in small arrays/large device sizes, which are implemented in ANNs or SNNs only at the simulation level. To leverage these concepts at higher technology readiness level (TRL), it is necessary to prove that the device characteristics are reproducible and scalable, the working principle well understood, and to provide more advanced characterizations on several down-scaled devices, and finally to close the current gap between laboratory exploration of single materials/devices and integration in arrays or circuits. Another challenge is to address in more detail how to exploit nanodevices’ peculiarities, such as dynamic or stochastic behavior, to implement in hardware more complex bio-inspired functionalities or even to perform radically new computation paradigms. Indeed, while the more standard technologies (CMOS, flash, and SRAM) can also be used in hardware neural networks to implement complex functions, this is possible only at the high cost of increased circuit complexity. To give an example, the required dynamic to reproduce the synaptic or neuronal functionality in SNNs is implemented at the circuit level and/or using large area capacitors, which are not easily scalable in view of large systems. One possible approach is to exploit the emerging memristive technologies and their properties (variability, stochasticity, and non-idealities) to implement complex functions with more compact and low-power devices. One example is the use of resistivity drift in PCM (usually an unwanted characteristic for IMC or storage applications) to implement advanced learning rules in hardware SNNs.79 Another example (discussed in Sec. VII A) is to use the inherent variability and stochasticity of some nanodevices to build efficient random number generators (for data security applications) and stochastic computing models. Overall, this scenario points out a long-term development research, likely up to ten years or more, to close the gap between these novel concepts and real industrial applications.
3. Potential solutions
To pursue advances in the development of neuromorphic hardware chips, it is necessary to develop a common framework to compare and benchmark different approaches, also in view of some metrics, such as computing density, energy efficiency, computing accuracy, learning algorithms, theoretical framework, and target possible killer applications that might significantly benefit from neuro-inspired chips. Within this framework, materials strategies can still be relevant to address some of the outlined challenges for NVMs, but materials need to be co-developed together with a demonstration of a device at the scaled node and array level. An important strategy for the future is also the possibility to substitute current mainstream materials with green materials or to identify fabrication processes more sustainable in terms of cost and environmental impact, without compromising the hardware functionality. Moreover, other important aspects include the development of hardware architecture that can lead to the integration of several devices and exploiting a large connectivity among them; the implementation of efficient algorithms supporting online learning, also on different time scales as in biological systems; and addressing the low power analysis of large amount of data also for Internet of things and edge devices. Overall, it is a necessary and holistic view that includes the materials/devices/architectures/algorithms co-design to develop a large-scaled neuromorphic chip.
4. Conclusion
The development of advanced neuromorphic hardware that can efficiently support AI applications is becoming more and more important. Despite the several prototypes and results presented in the literature, neuro-inspired chips are still only at an early stage of development and there is plenty of room for further development. Many mature NVM devices are definitely candidates to become a future mainstream technology for large scale neuromorphic processors that can outperform the current platform based only on CMOS circuits. In the long term, it is also necessary to close the gap between emerging materials and concepts, currently demonstrated only by proofs of concept, and their possible integration in functional systems. Materials research and an understanding of physical principles enabling novel functionalities are important parts for this scenario.
C. Possible future computational primitives for neuromorphic computing
Sergey Savel’ev and Alexander Balanov
The core idea of neuromorphic computing to develop and design computational systems mimicking electrochemical activities in brain cortex is currently booming, embracing areas of deep physical neural networks,80 classical and quantum reservoir computing,81,82 oscillator-based computing,83 and spiking networks,84 among many other concepts.85 These computational paradigms imply new ways for information processing and storage different from conventional computing and, therefore, require elementary base and primitives that often involve unusual novel physical principles.86
At present, memristors—electronic switchers with memory—and their circuits demonstrate great potential for application in the primitives for future neuromorphic computing systems. In particular, different types of volatile and non-volatile memristors can serve as artificial neurons and synapses, respectively, which facilitate the transfer, storage, and processing of information.87 For example, volatile Mott memristors88 can work as an electric oscillator with either regular or chaotic dynamics,7 while memristors with filament-formation89 demonstrate tunable stochasticity90,91 and allow designing neuromorphic circuits with different degrees of plasticity, chaoticity, and stochasticity to address diverse computational aims in mimicking dynamics of different neuron populations. Furthermore, a crossbar of non-volatile memristors (servicing to memorize training) attached to volatile memristors (working as readouts) enables the design of AI hardware with unsupervised learning capability.92 Thus, combining memristive circuits with different functionalities paves the way for building a wide range of in-memory computational blocks for a broad spectrum of artificial neural networks (ANNs) starting from deep learning accelerators to spiking neuron networks.93
A rapidly developing class of volatile memristive elements94 has been shown to demonstrate a rich spectrum of versatile dynamical patterns,7,95,96 which makes them suitable for the realization of a range of neuroscience-motivated AI concepts.97–99 For instance, the ANNs based on volatile memristors can go well beyond usual oscillator-based computing83 or spiking neural networks.84 They rely on manipulating information by utilizing complexity in dynamical regimes that offer a novel computational framework97,98 with cognitive abilities closer to biological brains. There is a specific emphasis on using dynamical behaviors of memristors, instead of only static behaviors.100
Remarkably, memristive elements can be realized not only in electronic devices but also within spintronic or photonic frameworks, which have their own advantages compared to electronics. Therefore, hybridized design promises great benefits in the further development of neuromorphic primitives. For example, a combination of memristive chipsets with spintronic and/or photonic components can potentially create AI hardware with enhanced parallelism offered by optical devices operating simultaneously at many frequencies (e.g., optical cavity eigenfrequencies),101 energy-efficient magnetic non-volatile memories, and flexible memristive spiking network architectures. An important step in the realization of this approach is the development of interface technologies for bringing electronic, photonic, and spintronic technologies together. A possible example is the spintronic memristor,65,102,103 where the transformation of magnetic structure influences the resistance of the system.104 An interface between neuromorphic optical and electronic subsystems of a hybrid device could be realized using optically controlled electronic memristive systems,105 thus paving a path for neuromorphic optoelectronic systems.106
The conventional ANNs with a large number of connections require training to be efficient in the task requiring frequent retraining for “moving target” problems, for example in the recognition of characteristics changing in time. A potential solution for such tasks is to implement filtering or pre-processing data by a “reservoir,”81 usually consisting of neuron units connected by fixed weights. The reservoir is assisted by a small readout ANN, which requires much less data for training, thus removing significant retraining burden. Recently, an important evolution has taken place in the development of reservoir computing systems, where the function of the reservoir is realized by photon, phonon, and/or magnon mode mixing in spintronic107,108 and photonic109 devices. Substitution of the interaction of many artificial neurons by wave processes resembles neural wave computation in the visual cortex98 and promotes miniaturization, robustness, and energy efficiency of the reservoirs (neuromorphic accelerators), which in the future could become an additional class of primitive, especially in neuromorphic computational systems dealing with temporal or sequential data processing.110 In AI training, it has also been shown that memristive matrix multiplication hardware can enable noisy local learning algorithms, which perform training at the edge with significant energy efficiencies compared to graphics processing units.111
Finally, we briefly outline another exciting perspective constituted by a combination of quantum and neuromorphic technologies.112 Currently, quantum AI113 attracts significant attention by increasingly competing with more traditional quantum computing. One of the most promising quantum AI paradigms is quantum reservoir computing,114 which offers not only much larger state space than classical reservoir computing but also essentially nonclassical quantum feedback on the reservoir via measurements. A quantum reservoir built from quantum memristors115,116 could significantly gain quantum AI efficiency as it can readily be integrated with existing quantum and classical AI devices and also lead to an “exponential growth”117 in the performance of “reservoirs” with the possibility of relaxing requirements on decoherence compared to traditional quantum computing.
The above trends and directions in the development of the primitives for neuromorphic computing are obviously only a slice of exciting future AI hardware technology. Even though we recognize that our choice is subjective, we hope that the outlined systems should provide a flavor of future computational hardware, which should be based on reconfigurable life-mimicking devices utilizing different physical principles in combination with novel mathematical cognitive paradigms.118–121
IV. MATURE TECHNOLOGIES (COMPUTING APPROACHES)
A. SRAM
Nitin Chawla and Giuseppe Desoli
1. Status
SRAM-based computing in memory (CIM) or in-memory computing is seen as a mature and widely available technology for accelerating matrix and vector calculations in deep learning applications, yet many technology driven optimizations are still possible. To make CIM more compatible, researchers have been exploring ways to improve the design of the bitcell (Fig. 6), which is the basic unit of memory. This has led to the development of high-end SRAM chips with large capacities, such as 107, 128, and 256 Mb SRAM chips at 10, 7, 5, and 4 nm.122–125 These large SRAM capacities help reduce the need for off-chip DRAM access. However, in more cost-sensitive applications, such as embedded systems and consumer products, modifying the bitcell design can be too costly and may limit the ability to easily transfer the technology to different manufacturing nodes.
A key difference exists between analog and digital SRAM CIM. Analog CIM has been heavily studied using capacitive or resistive sharing techniques to maximize row parallelism,126 but this comes at the cost of inaccuracies and loss of resolution due to variations in devices across process, voltage, and temperature (PVT) and the limitations of signal-to-noise ratio (SNR) and dynamic range in analog-to-digital converter (ADC)/readout circuits. The impacts of device variations for different kinds of devices are listed below.
Resistive devices, such as PCM or RRAM, experience a variation in the resistive values across the nominal behavior, which can vary based on the process, and for a case of ±10%–20% change in resistance value, there will be a corresponding change of current values, which are then input to the readout circuits, and hence, this will impact the quantization step of readout circuits, hence impacting the SNR, which will then need a higher dynamic range to compensate for the same. The temperature behavior for resistors also needs to be taken care in the noise margin.
MOS devices: These devices can vary in their performance (threshold voltage) due to the following:
Global lot positioning, such as slow, typical, and fast, can vary around ±20%, which can be less or more based on technology and voltage of operation. This is a deterministic shift.
Local variation: within the same lot, there are device to device variations, which are random in nature and need statistical analysis based on capacity in use to analyze the impact of variations. These impact the SNR and quantization as in the case of resistive devices and will need a higher dynamic range to compensate for the loss in accuracy.
Analog SRAM CIM solutions often use large logic bitcells and an aggressive reduction in ADC/readout bit width, resulting in low memory density and computing inaccuracies, making it difficult to use in situations where functional safety, low-cost testing, and system scalability are required. On the other hand, digital CIM offers a fast path for the next generation of neural processing systems due to its deterministic behavior and compatibility with technology scaling rules.
Researchers have improved the SRAM-based CIM’s performance by modifying the SRAM bitcell structure and developing auxiliary peripheral circuits. They proposed read–write isolation cells to prevent storage damage and transposable cells to overcome storage arrangement limitations. Peripheral circuits, such as digital-to-analog converters (DACs), redundant reference columns, and multiplexed ADCs, were proposed to convert between analog and digital signals. The memory cell takes up most of the SRAM area in the core module of a standard SRAM cut. However, the complexity of the additional operations performed in the memory unit poses additional problems to utilize the memory cells to their full potential. Researchers have explored various trade-offs to implement the necessary computational functionality while preserving density and power and, last but not least, minimizing the additional cost associated with bitcell modifications required for requalification when deployed in standard design flows. Most system-on-chips (SoCs) use standard 6T structures due to their high robustness and access speed and to minimize area overhead. The 6T storage cell is made up of two P-channel Metal-Oxide-Semiconductor (PMOSs) and four N-channel Metal-Oxide-Semiconductor (NMOSs) to store data stably. To perform CIM using the conventional 6T SRAM cell, operands are represented by the word line (WL) voltage and storage node data, and processing results are reflected by the voltage difference between bit line (BL) and bit line bar (BLB).
Figure 6 shows the conventional 6T and 8T bitcells that form the basic building block of the SRAM design. The 8T bitcell is made out of a conventional 6T bitcell and a read port that allows read and write in parallel. These bitcells were never designed for parallel access across rows, and this poses one of the main challenges for enabling analog SRAM CIM.
Dual-split 6T cells with double separation have been proposed,127,128 allowing for more sophisticated functions due to the separated WL and GND, which can use different voltages to represent various types of information. Dong et al.129 proposed a 4 + 2T SRAM cell to decouple data from the read path. The read is akin to that of the standard 6T SRAM, writes instead, and use the N-well as the Word-Line Write (WWL) and two PMOS sources as the Write-Bit Line (WBL) and Write Bit Line Bar (WBLB). In computational mode, different voltages on the WL and storage node encode the operands.
In general, CIM adopting the 6T bitcell structure is unable to efficiently perform computing operations and may not fully meet the requirements of future CIM architectures. Hence, many studies on CIM have modified the 6T structure because using the 6T standard cell directly poses a reliability challenge as the contents of the bitcells get effectively shorted if accessed in parallel on the same bit line. This means that special handling on the word line voltage is required, which adds lot of complexity and limits the dynamic range. Furthermore, the variability and linearity of devices become very difficult to control when limiting the device operation to reduced voltage levels due to these reliability constraints, impacting the overall energy efficiency of the solution.130,131
For practical applications, and specifically for AI ones, it is important to evaluate the end to end algorithmic accuracy vs the key metrics. To this end, recent research132–136 has suggested various analytical models to examine the balance between the costs (accuracy) and benefits (primarily, energy efficiency and performance) of digital vs analog SRAM CIM. This is based on the idea that many machine and deep learning algorithms can tolerate some degree of computational errors and that there are methods such as retraining and fine-tuning as well as hardware-aware training to address these errors.
The implementation of neural processing units incorporating CIM components for large-scale deep neural networks (DNNs) presents significant difficulties, CIM macros can incur substantial column current magnitudes, which can result in power delivery difficulties and sensing malfunctions. Furthermore, the utilization of analog domain operations necessitates the incorporation of ADCs and DACs, which consume a significant amount of area and energy resources. Further to this, the pitch matching of ADCs with SRAM bitcells also poses a big challenge for arrays and ADC interfaces. It is clear that the realization of the full potential of SRAM-CIM necessitates development of innovative and sophisticated techniques.
2. Challenges and potential solutions
In deep learning, convolutional kernels and other types of kernels rely heavily on matrix/vector and matrix/matrix multiplication (MVM). These operations are computationally expensive and involve dot product operations between activation and kernel values. In-memory multiplication in CIM macro devices can be classified into three primary categories: current-based, charge-sharing-based for analog computation, and one for all-digital. All-digital CIM exhibits the same level of precision as purely digital Application-Specific Integrated Circuit (ASIC) implementations. Various implementation topologies ranging from bit-serial to all parallel arithmetic implementations have been proposed for digital CIM solutions. Digital CIM, as in a previous study,137 represents a modified logic bitcell to support element-wise multiplication followed by a digital accumulation tree sandwiched within the SRAM array. The solution improves on energy efficiency by reducing data movement alongside the efficiency benefits of a custom-built Multiply Acummulate (MAC) pipeline with improved levels of parallelism over traditional digital Neural Processing Units (NPUs), for example, as in Fig. 7.138 The digital CIM implementations also have a wide voltage and frequency dynamic range allowing runtime reconfigurability between the competing Tera Operations Per Second Per Watt (TOPS/W) and TOPS/mm2 performance criteria. The operating range and mission profiles of these architectures can also be extended by leveraging read-and-write assist schemes as is commonly done for ultra-low voltage SRAM design. The digital CIM solution’s energy efficiency depends on the operand precision, and due to the deterministic precision and bit true computation nature, it begins to decline as we increase the operand precision.
Current-based CIM, as represented in one of the early research works,139 implements a WL DAC driving a multi-level feature input with multiple rows active in parallel. The results of the element-wise multiplications of all the parallel rows are accumulated as current on the bit lines of the CIM macro that terminates in a current-based readout/ADC. The current accumulation on the bit line essentially implements a reduction operation limited by the SNR of the readout circuit. Current-based CIM, as presented in this work, suffers from significant degradation in accuracy due to bitcell variabilities and nonlinearities of the WL DAC, while the throughput is limited by the readout circuits. Kang et al.140 implemented a variation with CIM using a 6T-derived bitcell with a Pulse Width Modulation (PWM) WL modulation and focused on storing and computing multi-bit weights per column. The modulation scheme uses binary weighted pulse duration based on the index of the bitcell in the column effectively encoding the multi-bit weight in the column to impact the value of the global bit line. The multiplication is effectively done in the periphery of this CIM using a switched capacitor circuit. Bitcell variations and nonlinearities as in the previous case significantly limit the accuracy of this implementation, thus restricting the industrialization potential of these current-based CIM solutions. The work in Ref. 141 tries to address the limitations of the above analog CIM techniques and implements charge-based CIM by using a modified SRAM logic bitcell that performs element-wise binary multiplication (XNOR) and transfers the results to a small capacitor. Multiple rows operating in parallel is key to the energy efficiency of these CIM topologies. In this work, the element-wise multiplication result is transferred as a charge to the global bit line followed by a voltage-based readout. The inherent implementation benefits from the fact that capacitors suffer less from process variability and present fairly linear transfer characteristics. This architecture, however, like other analog CIM is impacted by dynamic range compression due to the limited SNR regime of the readout at the end of the column. Jia et al.142 extended this approach to support multi-bit implementations using a bit-sliced architecture. The multi-bit weights are mapped to different columns, while the feature data are essentially transferred as 1-bit serial data on parallel word lines and each column performs a binary multiplication followed by accumulation on the respective bit lines. The near-memory all-digital recombination unit in this approach performs the shift and scale operations based on the column index to recreate the results of the multi-bit MAC operation. The approach is flexible to support asymmetric features and weight precision and can be made reconfigurable to support different features and weight precisions on the fly. This, however, still suffers from the same SNR constraints as each column operation is compressing the dynamic range and is limited by the peak dynamic range of the readout ADC. The ADC in most of these schemes is mostly shared across multiple columns, thus making it a critical design component in determining the throughput of such CIM architectures. The specific bit-sliced approach has impressive TOPS/W numbers for the lower weights and activation precision regime but starts to taper off due to the quadratic increase in the computation energy with increasing weight and activation precision The work in Ref. 143 instantiates multiple of these CIM macros to demonstrate a system-level approach connecting these CIM macros with a flexible interconnect and adding digital SIMD and scalar arithmetic units to support real-world neural network execution. This specific work due to the limited readout speed of the CIM macros and the overhead of the other digital units suffers from a moderate TOPS/mm2 number for the full solution but presents impressive TOPS/W numbers, especially at the lower precision regime. The work in Ref. 144 represents another effort with a system-level solution of a hybrid NPU comprising analog CIM units and traditional digital accelerator blocks. The work leverages a low-precision (2-bit) analog CIM macro coupled with a traditional 8-bit digital MAC accelerator. The two orders of magnitude difference in energy efficiency between the 8-bit digital MAC engine and the 2-bit analog CIM macro can be leveraged by mapping different layers to the appropriate computation engine but needs careful articulation of mapping algorithms with the precision constraints of the analog CIM while keeping the overhead of the write refresh and other digital vector/scalar operators low. This, to some extent, is a trade-off between a very specialized use case and a general-purpose NN accelerator.
3. Conclusions
Analog CIM solutions based on charge-based CIM display a lower degree of variability when compared to current-based CIM, due to variability in the technologies employed for capacitors and threshold voltage effects. In addition, charge-based CIM solutions are able to activate a greater number of word lines per cycle and thus achieve higher amounts of row parallelism. However, both current-based and charge-based CIM are limited in terms of accuracy and the equivalent bit precision of the dynamic range of the accumulation. Selecting an appropriate ADC bit-precision and MVM parallelism is a challenging task that requires balancing accuracy and power consumption. Measurements and empirical evidence suggest that an increase in the accumulation values is correlated with a higher degree of variability. However, it is important to note that such high values are relatively rare in practical neural network models, as shown by the statistical distribution of activation data and resultant accumulation outcomes. This characteristic along with noise-aware training can be leveraged to optimize the precision and throughput demands of the analog-to-digital converter, thereby improving the figure of merit (FOM) of these analog CIM techniques. The research on noise-aware training in the state of the art is limited to academic works on relatively small neural networks and datasets. This for an industrial deployment still needs to mature and demonstrate scalability to larger models and datasets.
All-digital CIM provides a deterministic and scalable path to intercept the implementation of NPUs by bringing an order or more of gain vs traditional all-digital NPUs. Digital CIM solutions provide excellent scaling for area and energy efficiency as we move toward more advanced CMOS nodes with a wide operating voltage and frequency range tunability while still maintaining a general purpose and application-agnostic view of embedded neural network acceleration at the edge.
On the other hand, for applications that can handle approximate computing, analog SRAM CIM-based solutions provide a much-increased level of computation parallelism and energy efficiency while still operating in an SNR-limited regime. The impacts of dynamic range compression and readout throughput are key algorithmic and design trade-offs while designing an analog CIM solution, which tries to operate in a much more restricted voltage and frequency regime as opposed to a digital CIM solution. That the application choices are more vertically defined as opposed to general purpose is also a deciding factor in choosing an analog SRAM CIM-based solution as opposed to digital CIM solutions. In conclusion, due to the rapid industrialization potential of SRAM-based CIM solutions and the opportunity of exploiting the duality of these CIM instances to serve as SRAM capacity to support the system in other operating modes, there are enough reasons to remain invested in SRAM CIM. The scope to improve both digital and analog SRAM CIM remains very high, both at the design and at the technology level, to exploit the best gains out of these two solutions, which in the future can also be combined to form a hybrid solution serving multiple modalities of neural network execution at the edge.
B. Flash memories
Gerardo Malavena and Christian Monzio Compagnoni
1. Status
Thanks to a relentless expansion in all the application fields of electronics since their conception in the 1980s, flash memories became ubiquitous non-volatile storage media in everyday life and a source of market revenues exceeding $60B in 2021. The origin of this success can be traced back to their capability to solve the trade-off against cost, performance, and reliability in data storage much better than any other technology. Multiple solutions to that trade-off, in addition, were devised through different design strategies that, in the end, allowed flash memories to target a great variety of applications in the best possible way. Among these different design strategies, the two leading to the so-called NOR flash memories145 and NAND flash memories146 became by far the most important.
As in all flash memory designs, NOR and NAND flash memories store information in memory transistors arranged in an array whose operation relies on an initialization, or erase, step performed in a flash on a large number of devices simultaneously. In particular, the erase step moves the threshold-voltage (VT) of all the memory transistors in a block/sector of the array to a low value. From that initial condition, data are stored through program steps performed in parallel on a much smaller subset of memory transistors, raising their VT to one or more predefined levels. This working scheme of the array allows us to minimize the number of service elements needed for information storage and, in the end, is on the basis of the high integration density, high performance, and high reliability of flash memories. Starting from it, the structure of the memory transistors, the architectural connections among them to form the memory array, the array segmentation in the memory chip, the physical processes exploited for the erase and program steps, and many other aspects are markedly different in NOR and NAND flash memories.
NOR flash memories follow a design strategy targeting the minimization of the random access time to the stored data, reaching latencies as short as a few tens of nanoseconds. A strong array segmentation is then adopted to reduce the delay time of the word-lines (WLs) and bit-lines (BLs) driving the memory transistors. As depicted in Fig. 8(a), moreover, the memory transistors are independently connected to the WLs, BLs, and source lines (SLs) of the array to simplify and speed up the sequence of steps needed to randomly access the stored data and to allow device operation at relatively high currents (currents in the microampere scale are typical to sense the data stored in the memory transistors). Fast random access is also achieved through a very robust raw array reliability, with no or limited adoption of error correction codes (ECCs). This design strategy, on the other hand, does not make NOR flash memories the most convenient solution from the standpoint of the area and, hence, the cost of the memory chip and limit the chip storage capacity to low or medium sizes (up to a few Gbits).
NAND flash memories rely on a design strategy pointing to the minimization of the data storage cost. Therefore, limited array segmentation is adopted and the memory transistors are in series connection along strings to reduce the area occupancy of the memory chip. Figures 8(b-1) and 8(b-2) schematically show the arrangement of the memory transistors in a planar and in a vertical (or, 3D) NAND flash array, respectively. At present, 3D arrays represent the mainstream solution for NAND flash memories, capable of pushing their bit storage density up to 15 Gbits/mm2,147 a level unreachable by any other storage technology. Such an achievement was made possible also by the use of multi-bit storage per memory transistor and resulted in memory chips with capacity as high as 1 Tbit.147 The NAND flash memory design strategy, on the other hand, makes the random access time to the stored data relatively long (typically, a few tens of microseconds). That is the outcome of time delays of the long WLs and BLs in the microsecond timescale, low sensing currents (tens of nanoampere) during data retrieval due to series resistance limitations in the strings, and the need of multi-bit detection per memory transistor. In addition, array reliability relies on powerful ECCs.
Given the successful achievements of flash memories as non-volatile storage media for digital data, exploiting them in the emerging neuromorphic-computing landscape appears as a natural expansion of their application fields and is attracting widespread interests. In this landscape, flash memories may work not only as storage elements for the parameters of artificial neural networks (ANNs) but also as active computing elements to overcome the von Neumann bottleneck of conventional computing platforms. The latter may represent, of course, the most innovative and disruptive application of flash memories in the years to come. At the same time, the use of flash memories as active computing elements may boost the performance, enhance the power-efficiency, and reduce the cost of ANNs, making their bright future even brighter. In this context, relevant research has been focusing on employing flash memory arrays as artificial synaptic arrays in hardware ANNs and as hardware accelerators for the vector-by-matrix multiplication (VMM), representing the most common operation in ANNs. Quite promising results have already been reported in the field, through either NOR148,149 or NAND150–155 flash memories. In these proofs of concept, different encoding schemes for the inputs (e.g., voltage amplitude or pulse width modulation, with signals on the BLs or WLs of the memory array) and different working regimes of the memory transistors have been successfully explored. Interested readers may go through the references provided in this section for a detailed description of the most relevant schemes proposed so far to operate a flash array as a computing element.
2. Challenges and potential solutions
Despite the encouraging proofs of concept already reported, the path leading to flash memory-based ANNs still appears long and full of challenges. The latter can be classified into the following categories:
a. Challenges arising from changes in the design strategy of the array.
As previously mentioned, the success of flash memories as non-volatile storage media for digital data arises from precise design strategies. Modifying those strategies to meet the requests of ANNs may deeply impact the figures of merit of the technology and should be carefully done. For instance, ANN topologies requiring to decrease the segmentation of NOR flash arrays may worsen their performance in terms of working speed. Increasing the segmentation of NAND flash arrays to meet possible ANN topology constraints or to enhance their working speed may significantly worsen their cost per memory transistor.
The cost per memory transistor of flash memories, in addition, is strictly related to the array capacity. Modifying the latter or not exploiting it all through the ANN topology may reduce the cost-effectiveness of the technology. In this regard, the very different capacities of NOR and NAND flash arrays make the former suitable for small/medium size ANNs (less than 1 giga parameters) and the latter suitable for large size ANNs (more than 1 giga parameters). The organization of the memory transistors into strings in NAND flash arrays represents an additional degree of complexity for the exploitation of their full capacity in ANNs. In fact, the number of memory transistors per string is the outcome of technology limitations and cost minimization and, therefore, cannot be freely modified. Exploiting all the memory transistors per string, then, necessarily sets some constraints on the ANN topology (the number of hidden layers, the number of neurons, etc.), which, of course, should be compatible with the required ANN performance.
Another important aspect to consider is that the accurate calibration of the VT of the memory transistors needed by high-performance and reliable ANNs may not be compatible with the block/sector erase scheme representing a cornerstone of all the design strategies of flash memories. Solutions to carry out the erase step on single memory transistors are then to be devised. These solutions may require a change of the array design as in Refs. 148 and 149 or new physical processes and biasing schemes of the array lines to accomplish the erase step as in Refs. 156–158. All of these approaches, however, necessarily impact relevant aspects of the technology, affecting its cost, performance, or reliability, and should be carefully evaluated.
The change of the typical working current of the memory transistors when exploiting flash memories for ANN applications is another critical point to address. In fact, reducing the working current of the memory transistors may make it more affected by noise and time instabilities. Increasing it too much, on the other hand, may raise issues related to the parasitic resistances of the BLs and SLs, and, in the case of NAND flash arrays, of the unselected cells in the strings.
b. Challenges arising from array reliability.
Flash memories are highly reliable non-volatile storage media for digital data. That, however, does not assure that they can satisfactorily meet the reliability requirements needed to operate as computing elements for ANN applications. Especially in the case of NAND flash memories, in fact, array reliability in digital applications is achieved through massive use of ECCs and a variety of smart system-level stratagems to take under control issues, such as electrostatic interference between neighboring memory transistors, lateral migration of the stored charge along the charge-trap storage layer of the strings, and degradation of memory transistors after program/erase cycles. All of that can hardly be exploited to assure the reliable operation of flash arrays as computing elements. In addition, the requirements on the accuracy of the placement and the stability over time of the VT of the memory transistors when using flash arrays as computing elements may be more severe than in the case of digital data storage. The possibility to satisfy those requirements in the presence of the well-known constraints to the reliability of all flash memory designs146,159,160 is yet to be fully demonstrated. In this context, periodic recalibration of the VT of the memory transistors and on-chip learning155 may mitigate the array reliability issues.
c. Challenges arising from the peripheral circuitry of the array.
As in the case of flash memory chips for non-volatile storage of digital data, the peripheral circuitry of flash memory arrays used as computing elements for ANNs should not introduce severe burdens on the chip area, cost, power efficiency, and reliability. In the latter case, this aspect is particularly critical due to the need to integrate on the chip not only the circuitry to address the memory transistors in the array and to carry out operations on them, but also, for instance, the circuitry to switch between the digital and the analog domain in VMM accelerators or to implement artificial neurons in hardware ANNs. Along with effective design solutions at the circuit level,150 process solutions, such as CMOS-under-array integration147 or heterogeneous integration schemes,152 should be exploited for successful technology development.
3. Conclusion
Flash memories may play a key role in the neuromorphic-computing landscape. Expanding their fields of application, they can be the elective storage media for ANN parameters. However, they can also be active computing media for high-performance, power-efficient, and cost-effective ANNs. To achieve this intriguing goal, relevant challenges must be faced from the standpoint of the array design, reliability, and peripheral circuitry. Winning those challenges will be a matter of engineering and scientific breakthroughs and will pave the way for years of unprecedented prosperity for both flash memories and ANNs.
V. EMERGING TECHNOLOGIES (COMPUTING APPROACHES)
Zhongrui Wang and J. Joshua Yang
A. Resistive switching and memristor
1. Status
Resistive switches (often called memristors when device nonlinear dynamics are emphasized) are electrically tunable resistors, of a simple metal–insulator–metal structure. Typically, their resistance changes as a result of redox reactions and ion migrations, driven by electric fields, chemical potentials, and temperature.64 There are two types of resistive switches according to the mobile ion species. In many dielectrics, especially transition metal oxides and perovskites, anions such as oxygen ions (or equivalently oxygen vacancies) are relatively mobile and can form a conduction percolation path, leading to the so-called valence change switching. For example, a conical pillar-shaped nanocrystalline filament of the Ti4O7 Magnéli phase was visualized using a transmission electron microscope (TEM) in a Pt/TiO2/Pt resistive switch.161 On the other hand, the conduction channels can also be created by the redox reaction and migration of cations, which involves the oxidation of an electrochemically active metal, such as Ag and Cu, followed by the drift of mobile cations in the solid electrolyte and the nucleation of cations to establish a conducting channel upon reduction. The dynamic switching process of a planar Au/SiOx:Ag/Au diffusive resistive memory cell was captured by in situ TEM.89
Resistive switches provide a hardware solution to address both the von Neumann bottleneck and the slowdown of Moore’s law faced by conventional digital computers. When these resistive switches are grouped into a crossbar array, they can naturally perform vector–matrix multiplication, one of the most expensive and frequent operations in machine learning. The matrix is stored as the conductance of the resistive memory array, where Ohm’s law and Kirchhoff’s current law physically govern the multiplication and summation, respectively.64 As a result, the data are both stored and processed in the same location. This in-memory computing concept can largely obviate the energy and time overheads incurred by expensive off-chip memory access on conventional digital hardware. In addition, the resistive memory cells are of simple capacitor-like structures, equipping them with excellent scalability and 3D stackability. So far, resistive in-memory computing has been used for hardware implementation of deep learning models to handle both unstructured (e.g., general graphs, images, audios, and texts) and structured data, as discussed in the following.
General graph: Graph-type data consist of a set of nodes together with a set of edges. The theoretical formulation has been made for graph learning using resistive memory on datasets such as WikiVote.162,163 Experimentally, a resistive memory-based echo state graph neural network has been used to classify graphs in MUTAG and COLLAB datasets as well as nodes in the CORA dataset,164 including few-shot learning of the latter.165
Images: Images are special graph-type data. Both supervised and unsupervised learning of ordinary images have been experimentally implemented on resistive memory. For supervised learning, offline trained resistive memory, where optimal conductance of memory cells is calculated by digital computers and transferred to resistive memory, is used to classify simple patterns,166,167 MNIST handwritten digits,168–171 CIFAR-10/100 datasets,172–174 ImageNet,175 and Omniglot one-shot learning dataset.176 In addition to offline training, online training adjusts the conductance of resistive memory in the course of learning, which is more resilient to hardware nonidealities in classifying simple patterns,37,166 Yale Face and MNIST datasets,177,178 CIFAR-10 dataset,179 and meta-learning of Omniglot dataset.180 Besides supervised learning, unsupervised offline learning with resistive memory is used for sparse coding of images181 and MNIST image restoration.182
Audios and texts: Learning sequence data, such as audios and texts, have been implemented on resistive memory. Supervised online learning using recurrent nodes has been done on the Johann Sebastian Bach chorales dataset.183 In addition, delayed-feedback systems based on dynamic switching of resistive memory are used for temporal sequence learning, such as spoken number recognition and chaotic series prediction.66,184,185 For offline learning, resistive memory is used for modeling the Penn Treebank dataset;186 Wortschatz Corpora language dataset and Reuters-21578 news dataset;187 and Bonn epilepsy electroencephalogram dataset and NIST TI-46 spoken digit dataset.188,189
Structured data: Despite unstructured data, structured data, such as those of a tabular format, have been tackled by resistive memory, including supervised classification of the Boston housing dataset on an extreme learning machine;190 K-means clustering of IRIS dataset and principal component analysis of the breast cancer Wisconsin (diagnostic) dataset;191,192 and correlation detection of quality controlled local climatological database.193
2. Challenges
Major challenges can be categorized at different levels.
Device level: The ionic nature of resistive switching, although benefits data retention, imposes challenges on programming precision, energy, and speed. The programming precision limits the representation capability of the resistive switch, or equivalently how many bits a device can encode. In addition, the programming energy and speed impact online learning performance. In addition, the degradation of the representation capability is further intensified by the read noise, manifestation by the current fluctuation under a constant voltage bias.
Circuit level: Analog resistive memory arrays are mostly interfaced with up- and downstream digital modules in a computing pipeline. As such, there is inevitable signal acquisition and conversion cost, which leads to the question of how to achieve trade-off between signal acquisition rate, precision, and power consumption. In addition, the parasitic resistance and capacitance, like the non-zero wire resistance, incur the so-called IR drop in the resistive memory crossbar array.
Algorithm level: So far, many applications of resistive memory suffer from significant performance loss in the presence of resistive memory nonlinearities (e.g., noises), thus defeating their efficiency advantage over alternative digital hardware.
3. Potential solutions
Device level: Various approaches are used to address the programming stochasticity, such as the local confinement of conducting filament.194 A denoising protocol using sub-threshold voltages has recently been developed to suppress the fluctuation of the device state and achieve up to 2048 conductance levels.195 In addition, homogeneous switching may suppress stochasticity at the cost of larger program energy and time overheads.64 In terms of programming energy, small redox barriers and large ion mobilities may reduce switching energy and accelerate switching speed, at the expense of retention and thermal stability though.
Circuit level: Typically, resistive in-memory computing relies on Ohm’s law and Kirchoff’s current law, resulting in current summation. However, there is a recent surge of interest in replacing current summation by voltage summation, which lowers down the static power consumption by eliminating current summation incurred Joule heating. In addition, fully analog neural networks have been proposed to get rid of the frequent analog-to-digital and digital-to-analog conversions.196 To combat with the parasitic wire resistance, a simple solution is to increase device resistance in both ON and OFF states, such as that demonstrated in a 256 × 256 in-memory computing macros.195
Algorithm level: A recent trend is hardware–software co-design to leverage resistive memory nonlinearities and turn them into advantages. For example, the programming stochasticity can be exploited by neural networks of random features (e.g., echo state networks164,165 and extreme learning machines190) and Bayesian inference using Markov Chain Monte Carlo (MCMC), such as Metropolis–Hastings algorithm.197 In addition, such programming noise is a natural regularization to suppress overfitting in online learning.198 Moreover, hyperdimensional computing187 and mixed-precision design, such as high-precision iterative refinement algorithm paired with low-precision conjugate gradient,199 can withstand resistive memory programming noise. The reading noise can also be exploited for solving combinatorial optimization problems using simulated annealing, serving as a natural noise source to prevent the system from falling into the local minimum.200,201
4. Conclusion
The advent of resistive switch-based in-memory computing in the past decade has demonstrated a wide spectrum of applications in machine learning and neuromorphic computing, reflected by its handling of different types of data.
However, there is still plenty of room, at device, circuit, and algorithm levels, to improve, which will help fully unleash the power of in-memory computing with resistive switches and potentially yield a transformative impact on future computing.
B. Phase change materials
Abu Sebastian and Ghazi Sarwat Syed
1. Introduction
Phase-change memory (PCM) is arguably the most advanced memristive technology. Similar to conventional metal-oxide based memristive devices, information is stored in terms of changes in atomic configurations in a nanometric volume of material and the resulting change in resistance of the device.202 However, unlike the vast majority of memristive devices, PCM exhibits volumetric switching as opposed to filamentary switching. The volumetric switching is facilitated by certain material compositions along the GeTe–Sb2Te3 pseudo-binary tie line, such as Ge2Sb2Te5, that can be switched reversibly between amorphous and crystalline phases of different electrical resistivities.203 Both transitions are Joule-heating assisted. The crystalline to amorphous phase transition relies on a melt-quench process, whereas the reverse transition relies mostly on crystal growth (Fig. 9).
There are essentially two key properties that make PCM devices particularly well suited for neuromorphic computing204 (see Fig. 10). Interestingly, this was pointed out by Stanford Ovshinsky, a pioneer of PCM technology, way back in 2003 when PCM was being considered just for memory applications.205 The first property is that PCM devices can store a range of conductance values by modulating the size of the amorphous region typically achieved by partial RESET pulses that melt and quench a certain volume of the PCM material. This analog storage capability, combined with a crossbar topology, allows for matrix–vector multiplication (MVM) operations to be carried out in O(1) time complexity by leveraging Kirchhoff’s circuit laws. This makes it possible to realize an artificial neural network on crossbar arrays of PCM devices, with each synaptic layer of the DNN mapped to one or more of the crossbar arrays.67,206 The second property referred to as accumulative property results from the progressive crystallization of the PCM material upon application of an increasing number of partial SET pulses. It is used for implementing DNN training;207 temporal correlation detection;193 continual learning;208 local learning rules, such as spike-timing-dependent plasticity;209,210 and neuronal dynamics.211
2. Challenges
PCM devices offer write operations in the tens of nanosecond timescale, which is sufficient for most neuromorphic applications, in particular those targetting deep learning inference. The cycling endurance could also exceed a billion cycles (dependent on the device geometry), which is several orders of magnitudes higher than commercial flash memory.214 This is sufficient for deep learning inference applications. The cycling endurance for partial SET pulses is much higher than that for full SET–RESET cycling and hence is widely considered sufficient for other neuromorphic applications as well. The read endurance is almost infinite for PCM when sufficiently low read bias is applied. Another key attribute is retention, which is typically tuned through material choice.215 However, the use of analog conductance states in neuromorphic computing makes the retention time of intermediate phase configurations even more important, which could be substantially lower than that of fully RESET states.
One of the primary challenges for PCM is integration density. For example, for DNN inference, it is desirable to have at least 10–100 × 106 on-chip weight capacity. The crossbar array for neuromorphic computing comprises metal lines intersected by synaptic elements, which are composed of one or more PCM devices and selector devices. Access devices, such as bipolar junction transistors or metal oxide–semiconductor field effect transistors, are preferred for accurate programming, while two-terminal polysilicon diodes offer scalability. To achieve high memory density, stacking multiple crossbar layers vertically is beneficial. Back-end of-line (BEOL) selectors, such as ovonic threshold switches, show promise but face challenges in achieving precise current control. Edge effects and thermal crosstalk between neighboring cells become significant at smaller feature sizes.247–249
Compute precision is a crucial aspect especially for DNN inference applications. The key challenges are 1/f read noise and conductance drift216 (see Fig. 11). Drift is attributed to the structural relaxation of the melt-quenched amorphous phase and exhibits a log time dependence. Conductance variations arising from temperature variations could also impact the compute precision.217 Another potential source of imprecision is voltage polarity dependence.218 The intrinsic stochasticity associated with the accumulative behavior could also be a challenge for applications such as online learning.219
For applications that exploit the accumulative behavior, there is a significant incentive to minimize the programming current. In fact, reducing the programming currents could also help with achieving better integration density by minimizing the requirements on the access devices. The primary way to achieve lower programming current is via scaling down the volume of switching material. PCM devices have decreased in programming energies by a factor of 1000 since the first memory chip was reported. Some device structures now exhibit programming energies in the tens of femtojoules (i.e., on par with the most efficient charge-based memories) via extreme volume scaling.220–223 However, analog capability is typically compromised, and extreme scaling also leads to fabrication challenges.
3. Potential solutions
Two main approaches have been taken to improve PCM devices: material engineering and device engineering. Material engineering involves exploration of new phase-change material compositions as well as alloying of phase-change materials with elements such as germanium, silicon, carbon etc.215 Yet another approach is the use of superlattice heterostructures.224,225 They utilize alternating layers of two different phase change materials that are only a few atoms thick, which create an electro-thermal confinement effect that enhances write efficiency.226,227 This approach also improves the write endurance and reduces the resistance drift and noise.228 However, additional research is necessary to fully comprehended the mechanisms and examine the impact of device geometries and the randomness that accompanies crystal growth and amorphization.229–232
Device engineering involves creating devices such as projected phase-change memory, which have a noninsulating projection segment that is placed in parallel to the phase-change material segment.233,234 Another fascinating approach is that of relying on nanoscale confinement of simple materials such as antimony to design better PCM devices.235 Besides improving the PCM devices themselves, one could also conceive innovative synaptic units with more than one PCM device to enhance the conductance window and to improve the compute precision.236 There is also potential to enhance the compute precision by programming approaches such as gradient-descent programming that relies on minimizing the MVM error as opposed to minimizing the programming error per device.237
Phase-change materials have functional properties in the optical domain that enable neuromorphic computing on photonic integrated circuits using photonic phase-change memory devices.238 By integrating these materials onto silicon waveguides,239,240 analog multiplication of incoming optical signals becomes feasible. Additionally, spike aggregation and convolution operations can be conducted in a single time step using wavelength division multiplexing.241,242 The accumulative behavior of phase-change materials also allows for more intricate operations such as correlation detection with high efficiency.243 This opens opportunities for the development of novel phase-change materials engineered specifically for photonic applications.
There are also reports of PCM device non-idealities being exploited for computational purposes. For example, the stochasticity associated with the accumulative behavior can create biorealistic randomly spiking neurons,211 and structural relaxation can be used to implement eligibility traces for reinforcement learning.244 The conductance fluctuations in PCM have also been exploited in an in-memory factorizer to disentangle visual attributes.245 Finally, the ability to induce field effect modulation in PCM devices combined with the analog storage capability can be exploited to realize mixed synaptic plasticities for solving optimization and sequential learning problems.246
4. Conclusion
With well-understood device physics models, established manufacturability, and proven integration capability with state-of-the-art CMOS logic platforms using BEOL processing, PCM becomes arguably the most advanced memristive technology. More recently PCM has been extensively researched for neuromorphic computing by exploiting its analog storage capability and accumulative behavior. However, commercialization of such technology requires improvements in achievable compute precision, integration density, all within the purview of BEOL compatible materials and processing. Moreover, as with commercialization of any emerging technology, a key deciding factor would be the manufacturing cost. The expectation is that the manufacturing cost barrier when PCM is used for computing applications is not as limiting as for storage-class memory applications. Most likely, neural processing units for DNN inference based on embedded PCM for analog in-memory computing would be commercialized first. Depending on the level of commercial acceptance of this technology, full-edged PCM-based accelerators could be developed to serve high-end edge applications or even cloud-based applications.
C. Ferroelectric materials
Thomas Mikolajick, Stefan Slesazeck, and Beatriz Noheda
1. Status
Ferroelectric materials are, in theory, ideally suited for information storage tasks since their switching is purely field-driven, holding the promise of extremely low write energy, and non-volatile at the same time. Moreover, unlike competing concepts, such as resistive switching or magnetic switching, ferroelectric materials offer three different readout possibilities giving a lot of flexibility in device design.250 In detail, the following read schemes can be applied (see also the middle part of Fig. 12):
Direct sensing of the switched charge during polarization reversal, as used in the ferroelectric RAM (FeRAM) concept, results in a cell design similar to a dynamic random-access memory (DRAM).251
Coupling of the ferroelectric to the gate of a field effect transistor and readout of the resulting drain current, as used in the ferroelectric field effect transistor (FeFET). This results in a cell that is similar to classical transistor-based charge storage (floating gate or charge trapping) memory cell, which is most prominently used in flash memories.252
Modulation of the tunneling barrier in a ferroelectric tunneling junction (FTJ). As a result, we can realize a two-terminal device, which is essentially a special version of a resistive switching memory cell (see Sec. V A).253
Each of the mentioned readout schemes has advantages and disadvantages, and therefore, the flexibility to use one of the three is a plus, especially in applications that go beyond pure memories, such as neuromorphic computing.
However, traditionally, ferroelectricity was only experienced in chemically complex materials, such as lead-zirconium titanate (PZT), strontium bismuth tantalate (SBT), or bismuth ferrite (BFO), which all are very difficult to incorporate into the processing flow for integrated electronic circuits, due to their limited stability in reducing environments. Another pervasive issue for the integration of ferroelectrics is their tendency to depolarize upon downscaling, an issue that is accentuated by their high permittivity. Organic ferroelectrics, the most prominent example being polyvinylidene difluoride (PVDF), can mitigate this problem, as their low permittivities reduce the depolarization fields, while a rather high coercive field increases the stability of the polarization state. Such materials are ideally suited for lab scale demonstrations of new device concepts, due to their simple fabrication using a solution-based process, and are highly preferred for flexible and biocompatible electronics.254 However, their limited thermal stability has taken them out of the game for devices in integrated circuits. Therefore, although the technology in the form of FeRAM255 is on the market for more than 25 years, it has lacked the ability to scale in a similar manner as conventional memory elements and, therefore, it is still limited to niche applications that require a high rewrite frequency together with non-volatility as in data logging applications.
2. Challenges
With the discovery of ferroelectricity in hafnia (HfO2) and zirconia (ZrO2), the biggest obstacle of the limited compatibility with integrated circuit fabrication could be solved.256 HfO2 and ZrO2 are stable both in reducing ambient and in contact with silicon, and their fabrication using established atomic layer deposition processes is standard in modern semiconductor process lines. However, new difficulties, especially with respect to reliability,257 need to be solved. Challenges in this direction are aggravated by the metastable nature of the ferroelectric phase, which appears mostly at the nanoscale, making a full understanding of the polar phase quite demanding. While their high coercive field makes them very stable with respect to classical retention, the ferroelectric phase typically exists together with other non-polar phases, which prevents them from reaching the predicted polarization values (of the order of 50 μC/cm2).258 Moreover, the most serious problem of any non-volatile ferroelectric device, the imprint, becomes very complex to manage in hafnia/zirconia-based ferroelectrics. The imprint is a shift of the hysteresis loops due to an internal bias. While this effect leads to a classical retention of the stored state that may look perfect, after switching, retention will be degraded and fixing the so-called opposite-state retention loss needs to be carefully done by material and interface engineering. Moreover, the high coercive field in this material class becomes a problem as HfO2 and ZrO2 often show a pronounced wake-up and fatigue behavior and the field-cycling endurance is in many cases limited by the dielectric breakdown of the material.
While the issues mentioned so far are valid for any non-volatile device application, in neuromorphic systems, additional challenges arise, including the linearity of the switching behavior and tuning of the retention to achieve both short-term and long-term plasticity, as well as specific effects to mimic neurons, such as accumulative switching,250,259 which need to be explored using material and device design measures. Finally, large-scale neuromorphic systems will require a high integration density that demands three-dimensional integration schemes, realized either by the punch-and-plug technology well-known from NAND flash or by integrating devices into the back-end of the line.
3. Potential solutions
Since the original report on ferroelectricity in hafnium oxide,256 the boundary conditions for stabilizing the ferroelectric phase have been much better understood, although there are still a number of open questions. The goal is to achieve a high fraction of the ferroelectric phase without dead layers of non-ferroelectric phases at the interface to the electrodes or in the bulk of the film. This needs to be done under the boundary conditions of a realistic fabrication process, which means that sophisticated methods to control the crystal structure based on epitaxial growth are not possible. Epitaxial growth can help clarify scientific questions, but the achieved results need to be transferred to chemical vapor deposition (CVD), including most prominently atomic layer deposition (ALD), or physical vapor deposition (PVD) processes using electrodes such as TiN or TaN that can be integrated into electronic processes.
In the past years, it became obvious that oxygen vacancies are, on the one hand, required to stabilize the ferroelectric phase260 and, on the other hand, detrimental to both the imprint and the field cycling behavior.261 Therefore, many proposals to integrate the ferroelectric layer with additional thin layers in the film stack have been made, and currently, a lot of work is going in that direction. Moreover, it is clear that the interface to the electrodes needs careful consideration. In this direction,262 facilitating the transport of oxygen, not only in the ferroelectric layer but also across the electrode interfaces, by minimizing the strain effects, may be key to improving device performance.263 When it comes to structures that are in direct contact with silicon, a recent observation of a quasi-epitaxial growth of extremely thin hafnium-zirconium oxide films on silicon could be an interesting direction.264 For concrete neuromorphic applications, the rich switching dynamics can be very helpful (see Fig. 12).265 While in large devices, a continuous switching between different polarization states is possible, devices scaled in the 10 nm regime show abrupt and accumulative switching.259 The former can be used for mimicking synaptic functions, while the latter is helpful to mimic neurons. In classical non-volatile memories, the depolarization fields created by non-ferroelectric layers or portions of the layer in series to the ferroelectric are a concern for the retention of the device. However, when creating short and long-term plasticity in synaptic devices, this can be turned into an advantage such that the device retention can be tailored.
D. Spintronic materials for neuromorphic computing
Bernard Dieny and Tuo-Hung (Alex) Hou
1. Status
Spintronics is a merging of magnetism and electronics in which the spin of electrons is used to reveal new phenomena, which are implemented in devices with improved performances and/or new functionalities. Spintronics has already found many applications in magnetic field sensors, in particular in hard disk drives and, more recently, as non-volatile memory (MRAM) in replacement of e-FLASH and last-level cache memory. Spintronics can also bring very valuable solutions in the field of neuromorphic computing both as artificial synapses and as neurons.
Artificial synapses are devices supposed to store the potential weight of the bounds linking two neurons. Various types of spintronic synapses have been proposed and demonstrated.65 They are magnetoresistive non-volatile memory cells working either as binary memory, as multilevel memory, or even in an analog fashion. Their resistance depends on the history of the current that has flown through the device (memristor). Most of these devices are based on magnetic tunnel junctions (MTJs), which basically consist of two magnetic layers separated by a tunnel barrier. One of the magnetic layers has a fixed magnetization (the reference layer), whereas the magnetization of the other (the storage layer) can be changed by either a pulse of magnetic field or current using phenomena such as spin transfer torque (STT) or spin–orbit torque (SOT).266 The resistance of the device depends on the amplitude and orientation of the magnetic moment of the storage layer relative to that of the reference layer (tunnel magnetoresistance effect—TMR). For a binary memory as in STT-MRAM or SOT-MRAM, only the parallel and antiparallel magnetic configurations are used.267 For multilevel or analog memory, several options are possible as illustrated in Fig. 13. One consists of varying the proportion of the storage layer area that is in parallel or antiparallel magnetic alignment with the reference layer magnetization. This can be achieved by step-by-step propagation of a domain wall within the storage layer using the STT produced by successive current pulses [Fig. 13(a)],268 or by gradually switching the magnetization of the storage layer exchange coupled to an antiferromagnet using the SOT produced by the pulsed current flow in the antiferromagnet [Fig. 13(c)],269 or by gradually switching the grains of a granular storage medium similar to the ones used in hard disk drives [Fig. 13(d)],270 or by nucleating a controlled number of magnetic spin nanotextures in the storage layer, such as skyrmions [Fig. 13(e)].271 Alternatively, the memristor resistance can also be varied by changing the relative angle between the magnetization of the reference and storage layers using all intermediate angles between 0° and 180° instead of only parallel and antiparallel configurations [Fig. 13(b)].103 Chains of binary magnetic tunnel junctions can also be used to achieve spintronic memristors but at the expense of a larger footprint.272
Concerning artificial neurons, the conventional CMOS neuron circuit is limited by its large area because a large number of transistors and a large-area membrane capacitor are required for implementing Integrate-and-Fire (I&F) functions.273 Recently, several spintronic neuron devices have been reported to generate spike signals by leveraging nonlinear and stochastic magnetic dynamics without the need for additional capacitors and complex peripheral circuitry. Spintronic neurons potentially show a great advantage for compact neuron implementation.274
Assembly of interacting spin-torque nano-oscillators (STNOs) based on the structure of magnetic tunnel junctions (MTJs) was proposed to achieve neuron functionality. An unstable conductance oscillation that mimics spike generation is induced at hundreds of MHz to several tens of GHz by flowing a current through the device. The frequency and amplitude of oscillation vary with the applied current and magnetic field. Torrejon et al. demonstrated spoken-digit or vowel recognition using such an array of nanoscale oscillators.275
Superparamagnetic tunnel junctions can also be used to mimic stochastic neurons. They have much lower thermal stability compared to the MTJ used for memory, so they stochastically switch between antiparallel (AP) and parallel (P) states due to thermal fluctuations, which is referred to as telegraphic switching.276 This switching mode can be used to generate Poisson spike trains in spiking neural networks (SNNs) as well as for probabilistic computing.277 A MTJ device with high thermal stability, which can implement not only synapses but also neurons in an all-spin neural network, was proposed by Wu et al.278 The reduction in the thermal stability factor is induced by self-heating at a high bias voltage for neuron operations.279 At a low bias, it stably stores weight information as synapses.
Many other new spintronic materials and mechanisms were also investigated for the feasibility of neuron devices, in particular based on magnetoelectric effects. For instance, by playing with magneto-ionic effects influencing the anisotropy at magnetic metal/oxide interfaces, the density of skyrmions280 and even their chirality could be controlled electrically.281 Jaiswal et al. designed a magnetoelectric neuron device for SNNs.282 Zahedinejad et al. demonstrated that electrically manipulated spintronic memristors can be used to control the synchronization of spin Hall nano-oscillators for neuromorphic computing.283
2. Challenges
Building useful fully functional neuronal circuits requires large-scale integration of layers of artificial neurons interconnected with spintronic synapses. Crossbar architectures can achieve cumulate and multiply functions very efficiently in an analog manner. An advantage of magnetic tunnel junctions over other technologies based on materials such as resistive oxides or phase change is their write endurance associated with the fact that their resistance change does not involve ionic migration. However, they exhibit a lower ROFF/RON ratio (∼4 for MRAM vs 10–100 for RRAM or PCM) and also narrower cell-to-cell distribution of resistance in ROFF and RON states. In crossbar architectures, MTJs should have high resistance to minimize power consumption. Therefore, efforts should be pursued to further increase the TMR amplitude of MgO-based MTJs and bring it closer to the expected theoretical values of several 1000%.284 In high-resistance MTJs, other approaches, such as SOT or voltage control of anisotropy (VCMA), could be used to change the MTJ resistance. In all cases, the control of the resistance change induced by current or voltage pulses must be improved. The operating temperature has often also a significant impact on magnetic properties, which imposes challenges on system design.
Concerning artificial neurons, the DC power required to trigger the magnetization dynamics of STNO neurons is still relatively high (mW range).285 Ways must be found to reduce it by using different materials or new designs. The switching speed and endurance in superparamagnetic tunnel junctions and self-heating-assisted MTJ neurons could be further enhanced to improve processing speed and system reliability.286,287 How to continue improving variability across millions of synapses and thousands of neurons to ensure high accuracy in future neuromorphic systems remains an actively research topic. Interconnecting all these devices is also a challenge, and innovative approaches beyond classical interconnects must be found notably by taking advantage of 3D integration.
3. Potential solutions
STT-MRAM entered volume production in 2019 at major microelectronic companies.288 This marked the adoption of this hybrid CMOS/magnetic technology by the microelectronic industry. Thanks to the combined efforts of the chip industry, equipment suppliers, and academic laboratories, spintronics is progressing very fast. Materials research is very important to increase magnetoresistance amplitude, switching currents, STT and SOT efficiency, and VCMA efficiency; reduce dependence on operating temperature; reduce current to trigger oscillations in STNOs; and reduce disturbance due to parasitic field. Investigations are in progress involving antiferromagnetic materials for reduced sensitivity to the field and access to THz frequency operation;289 half-metallic materials, such as Heusler alloys, for enhanced TMR amplitude and reduced write current;104 and topological insulators for very efficient spin/charge current interconversion possibly combined with ferroelectric materials.290
Concerning interconnects, fortunately, magnetic materials are grown in backend technology and can be stacked but at the expense of complexity and cost. Long-range information transmission can be carried out via spin-current or magnons or by propagating magnetic textures, such as domain walls291 or skyrmions.292 Light could also be used to transmit information in conjunction with recent developments related to all-optical switching of magnetization.293 In addition, a great advantage of spintronic stacks is that they can be grown on almost any kind of substrates, provided that the roughness of the substrate is low enough compared to the thickness of the layers comprised in the stack. This enables the use of the third dimension by stacking several spintronic structures, thereby gaining in interconnectivity.294
4. Concluding remarks
Spintronics can offer valuable solutions for neuromorphic computing. Considering that STT-MRAM is already in commercial production, it is very likely that the first generation of spintronic neuromorphic circuits will integrate this technology. Next, crossbar arrays implementing analog MTJs may be developed as well as neuronal circuits based on the dynamic properties of interacting STNOs for learning and inference. Still, many challenges are on the way toward practical applications, including speed, reliability, scalability, and variation tolerance, which need to be addressed in future research.
E. Optoelectronic and photonic implementations
Akhil Varri, Frank Brückerhoff-Plückelmann, and Wolfram Pernice
1. Status
Computing using light offers significant advantages in highly parallel operation exploiting concepts such as wavelength and time multiplexing. Moreover, optical data transfer enables low power consumption, better interconnectivity, and ultra-low latency. Already in the 1980s, first prototypes were developed; however, the bulky tabletop experiments could not keep pace with the flourishing CMOS industry. At present, novel fabrication processes and materials enable the (mass) production of photonic integrated circuits, allowing photonic systems to compete with their electronic counterparts. Especially in the area of data-heavy neuromorphic computing, the key advantages of photonic computing can be exploited.
Scientific efforts in neuromorphic photonic computing can be segregated in two major directions: (i) one approach is building hardware accelerators that excel at specific tasks, e.g., computing matrix–vector multiplications, by partially mimicking the working principles of the human brain, and (ii) the other is creating designs that aim to emulate the functionality of biological neural networks. Such devices are able to replicate the behavior of a neuron, a synapse, and learning mechanisms and to ultimately implement a spiking neural network.
There has been considerable progress in the (i) direction since 2017 when Shen et al.295 demonstrated vowel recognition where every node of the artificial neural network is physically represented in the hardware using a cascaded array of interferometers. This scheme has also been scaled to implement a three-layer deep neural network with in situ training capability.296 In addition, Feldmann et al.297 have demonstrated neurosynaptic networks on-chip and used them to perform image recognition. The photonic circuit deploys a non-volatile phase change material (PCM) to emulate the synapses and exploits the switching dynamics as a nonlinear activation function. As highlighted in Sec. V B, the integration of PCMs also leads to in-memory computing functionality owing to their non-volatile nature.
For the (ii) direction, significant work has been done on a device level to mimic individual components of the brain. Excitable lasers combining different material platforms, such as III–V compounds, and graphene have been shown to demonstrate leaky integrate and fire-type characteristics of a neuron.73 In addition, neurons based on optoelectronic modulators have been shown in the literature. For synapses, photonic devices combined with PCMs, amorphous oxide semiconductors, and 2D materials have been used to demonstrate synaptic behaviors, such as spike-time-dependent plasticity and memory.73,298,299 Furthermore, key synaptic functions have also been demonstrated using optically controlled reversible tuning in amorphous oxide memristors.300,301 Optical control of the conductance levels in memristors, such as those described in Sec. V A, enables low-power switching dynamics important to neuromorphic computing efforts.
In the following, we break down the challenge of building neuromorphic photonic hardware to various subtopics, ranging from increasing the fabrication tolerance of the photonic circuit to co-packaging the optics and electronics. Then, we review the current advances in those areas and provide an outlook on the future development of neuromorphic photonic hardware.
2. Challenges
A major challenge is combining the various building blocks shown in Fig. 14. Silicon on insulator is the platform of choice for building large circuits owing to the matured CMOS process flow and the high refractive index contrast between the silicon waveguide and oxide cladding. However, silicon has no second-order nonlinearity as it is centrosymmetric. Furthermore, silicon being an indirect bandgap material cannot emit light. This strongly limits the options for implementing nonlinear functions and spiking dynamics crucial for an all-optical neural network. Therefore, most of the research on mimicking neurons is focused on novel material platforms that support gain. A key challenge is integrating those different material platforms. For example, a circuit may deploy neurons based on III–V semiconductor heterojunctions and synapses built with PCMs on silicon. Therefore, compact and fabrication error-tolerant optical interconnects are crucial for the performance of the whole system.
Apart from packaging various optical components, the electro-optic interface imposes an additional challenge. Typically, the input data are provided by digital electrical signals, whereas optical data processing is analog. This requires analog-to-digital converters (ADCs) and digital-to-analog converters (DACs) for digital systems to interface with the chip as shown in Fig. 14. For large circuits, co-packaging electronics and photonics increases the footprint and cost significantly. This negatively affects the throughput.
In addition to the above, fabrication imperfections will also impact the performance of a photonic circuit. Components such as ring resonators, cavities, and interferometers employed in many photonic circuits are designed to operate at a certain wavelength. However, due to factors such as etching rate, sidewall angle, and surface roughness, the wavelength of operation many times does not match with the design. Hence, in many cases, active methods such as thermo-optic phase shifters are employed to post-fabricate trim the wavelength. This results in unnecessarily increased electronic circuitry adversely affecting the scalability of the system.
Finally, a challenge that may be critical in the future is the electro-optic modulator (EOM) efficiencies that depend on the material properties and configurations. An important figure of merit for EOM efficiency is VpiL. This merit shows the voltage that needs to be applied and the length of the modulator required to obtain a pi phase shift to the input. A smaller merit figure suggests increased power efficiency and a compact footprint. As photonic neuromorphic circuits are supposed to scale up in the future, the power budget and space available on-chip will play an essential role in influencing the designs.
3. Potential solutions
Solutions addressing the challenges mentioned above lie on multiple fronts. First, we discuss how the scalability can be improved from a device-level perspective. The compact footprint, power efficiency, and cascadability of the neurons are essential characteristics for improving the scalability. In this regard, modulator-based neurons can be improved by integrating with materials such as electro-optic polymers. These materials have an order of magnitude higher r33 electro-optic coefficient compared to bulk lithium niobate, which has been conventionally the material of choice for modulators. As a result, electro-optic polymers integrated with silicon waveguides show very low VpiL among fast modulators.302 In addition, novel materials, such as epsilon-near-zero (ENZ), which are promising for optical nonlinearity, can also be explored.303 Nevertheless, for the widespread use of these devices, a better understanding of the material properties and engineering efforts to integrate them into the existing manufacturing process flow is required.
Particularly, integration techniques, such as micro-transfer printing, flip-chip bonding, and photonic wire bonding, will play a key role. To solve the problem of packaging with electronics, strategies such as monolithic fabrication, where the photonics and electronics are on the same die, need to be investigated. Foundries are now offering multi-project wafer runs with these state-of-the-art packaging techniques.
For improving the scalability of spike-based processing systems, another class of neurons that is very promising is the vertical cavity surface emitting lasers (VCSELs). VCSELs can integrate 100 picosecond-long pulses and fire an excitable spike when the sum crosses a certain threshold, emulating biological neurons. Recently, it has been shown that the output of one layer of VCSEL neurons combined with a software-implemented spiking neural network can perform 4-bit image recognition.304 In order to build the entire system on hardware and perform larger experiments, 2D VCSEL arrays flip-chip bonded on a silicon die can be examined.
Finally, to address the challenge of fabrication imperfections, passive tuning approaches can be of interest, which need no additional circuitry and are non-volatile. One direction could be the use of phase change materials, such as GaS and Sb2S3, to correct for the variability in photonic circuits.305 These materials are very interesting since their real part of the refractive index can be tuned while keeping low absorption at telecom wavelengths. Another approach for post-fabrication passive trimming could be to use an electron beam or ion beam to change the material properties of the waveguide. This method is also scalable as these tools are widely used in the semiconductor industry.
4. Concluding remarks
Applications such as neuromorphic computing are particularly promising for optics where their unique advantages (i.e., high throughput, low latency, and high power efficiency) can be utilized. At present, there have been instances in the literature where different devices have been proposed to emulate the individual characteristics of a neurosynaptic model. However, there is a lot of scope for research in materials science to pave the way for more compact, cascadable, and fabrication-friendly implementations. Furthermore, large-scale networks are expected to scale in the near future by integrating state-of-the-art packaging techniques that are now available to research groups and startups.
To summarize, the growth of integrated photonics has led to a resurgence of optical computing not only as a research direction but also commercially. It is exciting to see how the field of neuromorphic photonics will shape as advancements in science and technology continue to happen.
F. 2D materials
Mario Lanza, Xixiang Zhang, and Sebastian Pazos
1. Status
Multiple studies have claimed the observation of resistive switching (RS) in two-dimensional layered materials (2D-LMs), but very few of them reported excellent performance (i.e., high endurance and retention plus low switching energy, time, and voltage) in a reliable and trustable manner and in a device small enough to be attractive for high-integration-density applications (e.g., memory and computation).
The best RS performance observed in 2D-LMs is based on out-of-plane ionic movements. In such types of devices, the presence and quality of the RS phenomenon mainly depend on three factors: the density of native defects, the type of electrode used, and the volume of the dielectric (thickness and area). In general, 2D-LMs with excellent crystallographic structure (i.e., without native defects, such as those produced by mechanical exfoliation) do not exhibit stable resistive switching. Reference 306 reported that mechanically exfoliated multilayer MoS2 does not show RS; only after oxidizing it (i.e., introducing defects), it shows RS based on the migration of oxygen ions. Along these lines, Ref. 307 showed that mechanically exfoliated multilayer hexagonal boron nitride (h-BN) does not exhibit RS; instead, the application of voltage produces a violent dielectric breakdown (DB) followed by material removal. The more violent DB phenomenon in h-BN compared to MoS2 is related to the higher energy for intrinsic vacancies formation: >10 eV for boron vacancies in h-BN vs <3 eV for sulfur vacancies in MoS2. Some articles claimed RS in mechanically exfoliated 2D-LMs, but very few cycles and poor performance were demonstrated; those observations are more typical of unstable DB than stable RS. Reference 308 reported good RS in a crossbar array of Au/h-BN/graphene/h-BN/Ag cells produced by mechanical exfoliation, but in that study, the graphene film shows an amorphous structure in the cross-sectional transmission electron microscope images. Hence, stable and high-quality RS based on ionic movement has never been demonstrated in as-prepared mechanically exfoliated 2D-LMs. This is something expected because ionic-movement-based RS is only observed in materials with high density of defects (e.g., high-k materials and sputtered SiO2) but not in materials with low density of defects (e.g., thermal SiO2), as the higher energy-to-breakdown forms an irreversible DB event.
On the contrary, 2D-LMs prepared by chemical vapor deposition (CVD) and liquid phase exfoliation (LPE) have exhibited stable RS in two-terminal memristors309 and three-terminal (memtransistors) configurations,310 although in the latter, the switching mechanism is largely different. In two-terminal devices, RS is enabled by the migration of ions across the 2D-LM. In transition metal dichalcogenides (TMDs), the movement of chalcogenide ions can be enough to leave behind a metallic path (often referred to as conductive nanofilament or CNF) that produces switching (similar to oxygen movement in metal-oxides).306 However, in h-BN, metal penetration from the adjacent electrodes is needed, as this material contains no metallic atoms.307 In 2D-LMs prepared by CVD and LPE methods, ionic movement takes place at lower energies (than in mechanically exfoliated ones) due to the presence of native defects (mainly lattice distortions and impurities). The best performance so far has been observed in CVD-grown ∼6 nm-thick h-BN, as it is the only material with enough insulation and thickness to keep low the current in the high resistive state.72 This includes the coexistence of bipolar and threshold regimes (the second one with highly controllable potentiation and relaxation), bipolar RS with endurances >5 × 106 cycles (similar to commercial RRAM memories and phase-change memories),311 and ultra-low switching energies of ∼8.8 zJ in the threshold regime.312 Moreover, a high yield (∼98%) and low variability have been demonstrated.312 In 2D-LMs produced by LPE or other solution-processing methods,313 the junctions between the flakes and their size play a very important role, and while there is evidence of potentially good endurance, synaptic behavior, and variability, sub-μm downscaling still has not shown equivalent performance.314
Apart from ionic movement, 2D-LMs can also exhibit RS based on ferroelectric effect.250 A remarkable example is In2Se3, which has electrically switchable out-of-plane and in-plane electric dipoles. Recent works have demonstrated that RS in ferroelectric In2Se is ensured by three independent variables (polarization, initial Schottky barrier, and barrier change) and that it delivers multidirectional switching and photon storage.315 However, the endurance and retention time are still limited to hundreds of cycles, and stable ferroelectric RS at the single-layer limit remains unexplored. Finally, tunable optoelectronic properties and unique electronic structure attainable through 2D-LM heterostructures present enormous potential for near-/in-sensor computing in neuromorphic systems. The responsiveness to physical variables (light, humidity, temperature, pressure, and torsion) of 2D-LM memristor and memtransistor devices allows us to mimic biological neurosynaptic cells (visual cortex and tactile receptors).316
2. Challenges and potential solutions
The main challenge of RS devices (of any type) is to exhibit high endurance in small devices. Many studies have reported RS in large devices with sizes >10 µm2 and claimed that their devices are “promising” for memory and computing applications. This is a huge and unreasonable exaggeration; these two applications require high integration density, as commercial devices for those applications have sizes down to tens or hundreds of nanometers. It should be noted that in ionic-movement-based RS devices, the CNF always forms at the weakest location of the sample; when the device size is reduced, the density and size of defects are (statistically) reduced, which produces an increase in the forming voltage.317 Hence, the CNF of smaller devices is wider due to the larger amount of energy delivered during the forming. This has a huge effect on state resistances, switching voltages, time, and energy, as well as endurance, retention time, and device-to-device variability. In other words, the fact that a large device (>10 µm2) exhibits good RS does not mean that a small device (<1 µm2) made with the same materials will also exhibit it; hence, RS “promising for memory and computation” is only the one that is observed in devices with sizes of tens/hundreds of square nanometers.
Taking this into account, the main challenge in 2D-LM based devices is to observe RS in small devices, and the most difficult figure-of-merit to obtain is (by far) the endurance. Reference 318 demonstrated good RS in 5 × 5 µm2 Au/h-BN/Au devices, in which h-BN was ∼6 nm-thick and grown by the CVD method; however, when the size of the devices was reduced to 320 × 420 nm2, the yield and the number of devices observed were very limited. The main issue was the current overshoot during the switching, which takes place randomly and produced irreversible DB in most devices. Similarly, solution-processed Pt/MoS2/Ti devices319 showed excellent performance across all figures-of-merit observed in 25 µm2 devices, but such performance has not been reported for 500 × 500 nm2 devices patterned via electron beam lithography. In this case, the large size of the nanoflakes (slightly below 1 µm minimum) may be imposing an intrinsic scaling limitation. Meanwhile, the scaling and overshoot problem was problem was solved in Ref. 72 integrating CVD-grown h-BN right on top (via the wet-transfer method) of a silicon complementary metal–oxide–semiconductor (CMOS) transistor, which acted as an instantaneous current limitation. Moreover, this approach brings the advantage of a very small device size (in Ref. 72, it was only 0.053 µm2, as the bottom electrode of the RS device is from one of the metallization levels). The heterogeneous integration of the 2D-LMs at the back-end-of-line (BEOL) wiring of silicon microchips could be a good way of testing materials for RS applications and directly integrating selector devices with each memristor (into one transistor–one memristor (1T1M) cells), which is fundamental for the realization of large memristive synapse arrays—all state-of-the-art demonstrations of memristive neural accelerators based on mature memristor devices use 1T1M cells or differential implementations of such cells (2T2M and 4T4M). So far, these CMOS testing vehicles for RS materials have mainly been employed by the industry; in the future, spreading this type of testing vehicles among academics working in the field of RS could improve the quality of the knowledge generated. In addition, these devices may benefit from common practices in the field of silicon microchip manufacturing, such as surface planarization, plug deposition, and high-quality thick interconnect techniques.
Next steps in the field of 2D-LMs for RS applications consist in improving the materials quality to achieve better reproducibility of the experiments (from one batch to another) and adjust the thickness and density of defects to achieve better figures-of-merit in nanosized RS devices while growing 2D-LM at the wafer-scale.320 Recent studies successfully synthesized large-area single-crystal 2D-LMs via CVD,321 although in most cases, it is only monolayer. However, monolayer 2D-LMs are less than 1-nm-thick, and when they are exposed to an out-of-plane electrical field, a very high leakage current is generated even if no defects are present, which increases a lot the current in the high-resistance state (HRS) and the energy consumption of the device. Reference 322 presented the synthesis of single-crystal multilayer h-BN using scalable methods, but controlling the number of layers is still difficult. Electrical studies in such types of single-crystal multilayer samples should be conducted. Improving manipulating methods to prevent the formation of cracks during transfer is also necessary, although it is worth mentioning that multilayer h-BN materials are more mechanically stable than monolayers.
Recent demonstration of vector–matrix multiplication using MoS2 memtransistors323 is a promising advance in terms of a higher-level functional demonstration, although the fundamental phenomenon exploited is the well-known floating gate memory effect, not unique to 2D-LM themselves. Meanwhile, understanding the role of flake size in the functionality of solution-processed 2D-LM two-terminal synaptic devices is critical to address the true scaling limitations of such an approach, a key aspect to define potential realistic applications in neuromorphic systems. On the other hand, sensing capabilities emerge with great potential for biological synaptic mimicking. The full potential of different 2D-LM material heterostructures and memtransistors opens a huge design-space worth of exploration. In that sense, the complex physical characteristics offered by different 2D-LMs hold the potential not only for basic neuromorphic functionality but also for higher-order complexity. This could be exploited to achieve high-complexity neural and synaptic functions,100 more closely mimicking actual biological systems. However, in parallel to elucidating the physical properties and capabilities of these material systems, efforts should be put into strengthening the quality of the reported results, focusing on proper characterization methods, reliable practices, and statistical validation.
3. Concluding remarks
Leading companies, such as TSMC, Samsung, IBM, and Imec, have started to work with 2D-LMs, but mainly for sensors and transistors. In the field of 2D-LM based neuromorphic devices, most work is being carried out by academics. In this regard, unfortunately, many studies make a simple proof-of-concept using a novel nanomaterial without measuring essential figures-of-merit, such as endurance, retention, and switching time. What is even worse, in many cases, the studies employ unsuitable characterization protocols that heavily overestimate the performance (the most popular case is the erroneous measurement of endurance324), withholding information regarding the failure mechanisms that lead to certain performance metrics not being achieved on some devices. This working style often result in articles with striking numbers (i.e., performance), but those are unreliable, and it is really bad for the field because it creates a hype of expectations and disillusion among investors and companies. The most important is that the scientists working in this field follow a few considerations: (i) always aim to show high performance in small (<1 µm2) devices fabricated using scalable methods (even better if they are integrated on a functional CMOS microchip, not on an unfunctional SiO2 substrate); (ii) measure all the figures-of-merit of several (>100) memristive devices for the targeted application (this may vary depending on the application);214 (iii) clearly define the yield-pass criteria and the yield achieved, as well as the device-to-device variability observed; and (iv) whenever a failure mode is observed preventing reaching a desired figure-of-merit, clearly convey it to maximize the probabilities of finding a solution.
VI. MATERIALS CHALLENGES AND PERSPECTIVES
Stefan Wiefels and Regina Dittmann
A. Materials challenges
For the neuromorphic computing approaches addressed in Sec. III, the use of emerging memories based on novel materials will be key in order to improve their performance and energy-efficiency. This chapter discusses the most relevant properties and challenges for different use cases and how they relate to the respective materials properties. However, it is important to note that a dedicated co-development of materials with the readout and write algorithms and circuitry will be required in order to advance the field (Fig. 15).
1. Scaling
One main driving force to use emerging materials and devices is to gain space and energy efficiency by the fabrication of highly dense crossbar arrays. For STT-MRAM, scaling down to 11 nm cells has been demonstrated, as well as the realization of 2 Mb embedded MRAM in 14 nm FinFET CMOS.325 However, due to the small resistance ratio of 2–3, the readout of magnetic tunnel junctions (MTJs) is more complex than for other technologies. Nevertheless, a 64 × 64 MTJ array, integrated into 28 nm CMOS, has recently been realized.75 Advancements from the material side will be needed in order to increase the resistance ratio of MTJs in the future.
For ferroelectric HfO2-based devices, the main challenge with respect to scaling is to decrease the thickness reliably in order to enable 3D capacitors with 10 nm node and to obtain a uniform polarization at the nanoscale of a material that currently still contains a mixture of different phases. Therefore, ultrathin films with the pure ferroelectric orthorhombic phase and without any dead-layers at the interfaces will be key to approach the sub-20 nm regime of hafnia-based ferroelectric devices.9
PCM devices can be fabricated on the sub-10 nm scale.326 The limiting factor for CMOS integrated PCM devices is the high RESET current, which is required to implement larger access transistors.202 Commercially available ReRAM cells with conventional geometries have been co-integrated on 28 nm CMOS technology. By employing a sidewall technique and nanofin Pt electrodes, small arrays with 1 × 3 nm2 HfO2202 cells and 3 × 3 arrays of Pt/HfO2/TiOx/Pt cells with a 2 nm feature size and a 6 nm half-pitch have been fabricated, respectively.327
With respect to ultimate scaling, the loss of oxygen to the environment might pose limitations to the retention times for ReRAM devices scaled in the sub-10 nm regime.328 However, filaments in the size of 1–2 nm can be stable if they are stabilized by structural defects, such as grain boundaries or dislocations. Therefore, finding materials solution for confining oxygen vacancies to the nanoscale might retain the required retention for devices in the few nm scale.
2. Speed
Although the extensive parallelism leads to high demands for scaling, it is considered an advantage as it makes the race for ever increasing clock frequencies obsolete.329 In contrast, the operation speed is closely linked to the respective application, i.e., the timing is based on real physical time.329 As signals processed by humans are typically on a time scale of milliseconds or longer, the expected speed benchmark is well below the reported speed limits of emerging NVMs. Nevertheless, it is reasonable to understand the ultimate speed limits of NVM concepts in order to estimate maximum learning rates, to explore the impact of short spiking stimulation. Furthermore, novel computing concepts, as discussed in Sec. VII, might still benefit from higher clock frequencies. For MRAM, reliable 250 ps switching has been demonstrated by using double spin-torque MTJs, which consist of two reference layers, a tunnel barrier, and a non-magnetic spacer.330 FeRAM arrays have successfully been switched with 14 ns at 2.5 V. Ferroelectric field effect transistors (FeFETs) have been shown to switch with <50 ns pulses in 1 Mbit memory arrays.9 PCM devices can be switched with pulses <10 ns.326 In general, their speed is limited by the crystallization time of the material. It has been shown exemplarily on GexSnyTe samples that this time can be tuned in a broad range of 25 ns up to 10 ms by adjusting the material composition.331 Thus, it has a high potential to match the operation time of an NC system to the respective application. For VCM ReRAM, SET and RESET switching with 50 and 400 ps have been demonstrated.332 Both are so far limited by extrinsic effects and device failure modes rather than by intrinsic physical rate limiting steps.
3. Reliability
Independent of the application, the reliability of the memory technology has to be taken into account. In the case of implementing NVMs as artificial synapses, the requirements of learning and inference phases have to be distinguished. Whereas the endurance is more relevant for learning schemes, the stability of the programmed state, i.e., the retention and robustness against read disturb, has to be sufficient for reliable inference operations.
4. Endurance
While MRAM has, in principle, unlimited endurance, all memristive devices that are based on the motion or displacement of atoms, such as ReRAM, PCM, and ferroelectric systems, have limited endurance. For silicon-based FeFETs, the endurance is typically on the order of 105, which is mainly limited by a dielectric breakdown in the SiO2 at the Si–HfO2 interface.9 Regarding the endurance of VCM ReRAM, it has been demonstrated with convincing statistics that >106 cycles are realistic. Some reports suggest maximum cycle numbers of more than 1010.324 Depending on the material system, various failure mechanisms for endurance are discussed. The microstructure of the switching material might degrade or be irreversibly penetrated by metallic atoms.9 In VCM ReRAM, an excessive generation of oxygen vacancies was discussed as an endurance limiting factor.333 Novel material solutions, which confine ions to the intended radius of action, might be a pathway to increase the endurance of ReRAM devices. For PCM, it was suggested to implement multiPCM synapses. Arbitration over multiple memory elements might circumvent endurance and variability issues.9
A typical limitation with respect to a reliable operation of ferroelectric memories is the so-called wake-up effect, which causes an increasing polarization after a few cycles and the fatigue resulting in a decrease in the polarization for high cycle numbers. Both are induced by the motion of defects such as oxygen vacancies and will have to be tackled in the future by intense materials research in this field.
5. Retention
After training, the state of the non-volatile memory synapse is required to be stable for 10 years at an operating temperature of 85 °C. However, for many applications in the field of neuromorphic applications, the requirement is much more relaxed in particular for the training phase. From a thermodynamical point of view, the states in ferroelectric or ferromagnetic memories might both be stable. In contrast, ReRAM and PCM devices store information in the configuration of atoms where both low-resistance state (LRS) and HRS are metastable states and the retention is determined by material parameters, such as the diffusion coefficient of the respective species.9 Here, the degradation is not a digital flipping of states but a gradual process. For PCM, the drift of the resistance state is caused by the structural relaxation of the melt-quenched amorphous phase.202 Apart from a drifting of the state, a broadening of the programmed state distribution (e.g., resistance) is typically observed for ReRAM.334 Furthermore, since analog or multi-level programming is highly relevant for NC, it should be considered that intermediate resistance states might have a reduced retention compared to the edge cases of high and low resistive states as demonstrated for PCM devices.9
6. Read disturb
During inference, frequent reading of the memory elements is required, which should not change the learned state. For a bipolar ReRAM memory, a read disturb in the HRS/LRS occurs mainly when reading with a SET/RESET polarity since the read-disturb can be considered as an extrapolation of the SET/RESET kinetics to lower voltages. Nevertheless, the HRS state in bipolar filamentary VCM has been demonstrated by extrapolation to be stable for years at read voltages up to 350 mV.335
7. Variability
Variability is most pronounced for systems that rely on the stochastic motion and redistribution of atoms, such as ReRAM and PCM. Here, the variability from device to device (D2D), from cycle to cycle (C2C), and even from one read to the next (R2R) has to be distinguished. By optimizing fabrication processes, the D2D variability can be kept comparatively low. In contrast, the C2C variability for filamentary resistive ReRAM and PCM can be significant due to the randomness of filament335 or crystal202 growth, respectively. However, using smart programming algorithms, the C2C variability can be very well reduced to a minimum.335 By contrast, R2R variations remain in the form of read noise in filamentary VCM. It is typically attributed to the activation and deactivation of traps or the random redistribution of defects335 and strongly depends on the material.336 For PCM, R2R variations are caused by 1/f noise and temperature induced resistance variations. One approach to address these issues as well as the drift is to use the so-called projected phase change memory with a non-insulating projection segment in parallel to the PCM segment.
8. Analog operation
For most computing concepts described in Sec. III, their operation with binary memory devices is strongly limited and the possibility to adjust multiple states is of crucial importance. For devices with thermodynamically stable states, such as ferroelectric or magnetic memory, intermediate states rely on the presence of domains. As a result, the performance strongly depends on the specific domain structure and scaling might be limited by the size of the domains. Nevertheless, multilevel switching has been demonstrated by fine-tuning of programming voltages for both FTJ and FeRAM.337
For ReRAM and PCM, the metastable intermediate states have to be programmed in a reliable manner. Since these states are kinetically stabilized during programming, the success depends strongly on the switching kinetics of the specific system, the operation regime, and the intrinsic R2R variability of the material. For PCM devices, intermediate states can be addressed by partial reset pulses, which result in partial amorphization. As a result of the crystallization kinetics, a gradual crystallization can be obtained by consecutive pulses.
Filamentary ReRAM devices usually undergo an abrupt SET, which is caused by the self-accelerating, thermally driven filament formation. However, it is possible to obtain intermediate states with good control of the SET current, by precisely controlling the timing of the SET voltage pulses, or by slowing down the switching kinetics. This is the case for non-filamentary systems, which show a very pronounced gradual behavior for both SET and RESET.327
Furthermore, resistive switching devices with purely electronic switching mechanisms, such as trapping and de-trapping of electrons at defect states, might be promising for analog operation.338