In-memory computing (IMC) has emerged as a new computing paradigm able to alleviate or suppress the memory bottleneck, which is the major concern for energy efficiency and latency in modern digital computing. While the IMC concept is simple and promising, the details of its implementation cover a broad range of problems and solutions, including various memory technologies, circuit topologies, and programming/processing algorithms. This Perspective aims at providing an orientation map across the wide topic of IMC. First, the memory technologies will be presented, including both conventional complementary metal-oxide-semiconductor-based and emerging resistive/memristive devices. Then, circuit architectures will be considered, describing their aim and application. Circuits include both popular crosspoint arrays and other more advanced structures, such as closed-loop memory arrays and ternary content-addressable memory. The same circuit might serve completely different applications, e.g., a crosspoint array can be used for accelerating matrix-vector multiplication for forward propagation in a neural network and outer product for backpropagation training. The different algorithms and memory properties to enable such diversification of circuit functions will be discussed. Finally, the main challenges and opportunities for IMC will be presented.
I. INTRODUCTION
Data-intensive computing tasks, such as data analytics, machine learning, and artificial intelligence (AI), require frequent access to the memory to exchange data input, output, and commands. Since the high-density memory is generally off-chip with respect to the central processing unit (CPU), data movement represents a significant overhead in the computation, largely exceeding the energy required for on-chip digital data processing.1,2 There are two possible directions to tackle this memory bottleneck: one is the optimization of the data throughput in a multi-chip approach, such as the high bandwidth memory (HBM)3 or the hybrid memory cube (HMC).4 The second approach is to radically change the computing paradigm by enabling in situ computation of data within the memory, which goes by the name of in-memory computing (IMC).5–8
Various concepts of IMC have been proposed depending on the degree of integration of memory and processing, as illustrated in Fig. 1. On the one hand, a conventional von Neumann architecture depicted in Fig. 1(a) has physically separate memory and computing unit sitting on distinct chips, where the movement of input/output/instructions causes significant latency and excess energy consumption. One solution to mitigate these issues is the concept of near-memory computing (NMC) shown in Fig. 1(b), where the embedded nonvolatile memory (eNVM) is integrated on the same chip as the computing unit to minimize the latency.9,10 Note that eNVM serves as pure data storage for parameters and instructions in NMC, while the static random access memory (SRAM) is used as a cache memory storing intermediate input/output data. A further degree of integration consists of the true IMC approach shown in Fig. 1(c), where the SRAM is used directly as a computational engine, e.g., to accelerate matrix-vector multiplication (MVM).8 An additional overhead is the need to move the computational parameters from the local eNVM [or an off-chip dynamic random access memory (DRAM)] to the volatile SRAM every time the computation is needed. To mitigate this drawback, the ultimate concept to maximize the integration of memory and processing is IMC within the eNVM, as shown in Fig. 1(d).7 This approach appears as the most promising concept to minimize the data movement, hence energy consumption and latency, although there are significant challenges and trade-offs in terms of throughput, energy efficiency, and accuracy of the processing. Emerging memories represent a promising approach for eNVM in IMC, given several attractive properties of scaling, 3D integration of back-end processing, and nonvolatile storage of computing parameters. The interplay of device technologies, circuit engineering, and algorithms thus requires a strong effort in terms of co-design across multiple disciplines.11
This Perspective provides an overview of IMC, including the status of the memory device technologies and the circuit architectures for a broad portfolio of applications. Section II describes the state-of-the-art memory devices for IMC, including both two-terminal and three-terminal emerging memory technologies. Section III presents the concept of analog IMC, highlighting the main challenges from a memory array point of view. Section IV addresses matrix-vector multiplication, which is a fundamental computing primitive at the basis of most IMC applications. Section V reviews the state-of-the-art of closed-loop IMC, which enables highly complex algebraic operations with reduced complexity. Section VI presents an overview of the field of content-addressable memories. Section VII focuses on accelerators for the training of neural networks based on in-memory outer product. Section VIII addresses brain-inspired neuromorphic computing leveraging device physics to reproduce neurobiological processes of sensing and learning. Finally, Sec. IX provides an outlook on the next urgent challenges and opportunities that need to be addressed.
II. EMERGING MEMORY TECHNOLOGIES
Charge-storage memories based on the complementary metal-oxide-semiconductor (CMOS) technology provide the mainstream memory technology for digital computing systems. Figure 2 illustrates the memory hierarchy of CMOS-based computing systems, including (from top to bottom) on-chip registers and static random access memory (SRAM), followed by off-chip dynamic random access memory (DRAM) and nonvolatile Flash storage. While performance (e.g., access time) decreases from top to bottom, the area density and cost decrease from bottom to top, with NAND flash representing the highest density thanks to 3D integration.12,13 Within this scenario, emerging memories based on material storage have been developed in an effort to provide a better trade-off between performance, area, and cost. In particular, emerging memory devices show unique storage principles relying on the physics of the active materials and offer advantages in terms of scalability,14 integration in 3D structures,15,16 and energy efficiency. These properties are also attractive for application as embedded memories in systems-on-chip, where flash memory faces additional integration difficulties due to the high-κ/metal-gate process of the silicon front-end circuits.17 Emerging memories have also attracted a considerable interest for IMC applications thanks to the nonvolatile storage of computing weights, high density, and fast programming/read. Figure 3 shows a summary of the main emerging memories, including two-terminal and three-terminal devices. Table I shows a summary of the properties of emerging memories compared to other nonvolatile memory technologies.18
Technology . | NOR flash . | NAND flash . | RRAM . | PCM . | STT-MRAM . | FeRAM . | FeFET . | SOT-MRAM . | Li-ion . |
---|---|---|---|---|---|---|---|---|---|
On/off ratio | 104 | 104 | 10–102 | 102–104 | 1.5–2 | 102–103 | 5–50 | 1.5–2 | 40–103 |
Multilevel operation | 2 bit | 4 bit | 2 bit | 2 bit | 1 bit | 1 bit | 5 bit | 1 bit | 10 bit |
Write voltage (V) | <10 | 10 | <3 | <3 | <1.5 | <3 | <5 | <1.5 | <1 |
Write time | 1–10 µs | 0.1–1 ms | <10 ns | ∼50 ns | <10 ns | ∼30 ns | ∼10 ns | <10 ns | <10 ns |
Read time | ∼50 ns | ∼10 µs | <10 ns | <10 ns | <10 ns | <10 ns | ∼10 ns | <10 ns | <10 ns |
Stand-by power | Low | Low | Low | Low | Low | Low | Low | Low | Low |
Write energy [J/bit] | ∼100 pJ | ∼10 fJ | 0.1–1 pJ | 10 pJ | ∼100 fJ | ∼100 fJ | <1 fJ | <100 fJ | ∼100 fJ |
Linearity | Low | Low | Low | Low | None | None | Low | None | High |
Drift | No | No | Weak | Yes | No | No | No | No | No |
Integration density | High | Very high | High | High | High | Low | High | High | Low |
Retention | Long | Long | Medium | Long | Medium | Long | Long | Medium | ⋯ |
Endurance | 105 | 104 | 105–108 | 106–109 | 1015 | 1010 | >105 | >1015 | >105 |
Suitability for DNN training | No | No | No | No | No | No | Moderate | No | Yes |
Suitability for DNN inference | Yes | Yes | Moderate | Yes | No | No | Yes | No | Yes |
Suitability for SNN applications | Yes | No | Yes | Yes | Moderate | Yes | Yes | Moderate | Moderate |
Technology . | NOR flash . | NAND flash . | RRAM . | PCM . | STT-MRAM . | FeRAM . | FeFET . | SOT-MRAM . | Li-ion . |
---|---|---|---|---|---|---|---|---|---|
On/off ratio | 104 | 104 | 10–102 | 102–104 | 1.5–2 | 102–103 | 5–50 | 1.5–2 | 40–103 |
Multilevel operation | 2 bit | 4 bit | 2 bit | 2 bit | 1 bit | 1 bit | 5 bit | 1 bit | 10 bit |
Write voltage (V) | <10 | 10 | <3 | <3 | <1.5 | <3 | <5 | <1.5 | <1 |
Write time | 1–10 µs | 0.1–1 ms | <10 ns | ∼50 ns | <10 ns | ∼30 ns | ∼10 ns | <10 ns | <10 ns |
Read time | ∼50 ns | ∼10 µs | <10 ns | <10 ns | <10 ns | <10 ns | ∼10 ns | <10 ns | <10 ns |
Stand-by power | Low | Low | Low | Low | Low | Low | Low | Low | Low |
Write energy [J/bit] | ∼100 pJ | ∼10 fJ | 0.1–1 pJ | 10 pJ | ∼100 fJ | ∼100 fJ | <1 fJ | <100 fJ | ∼100 fJ |
Linearity | Low | Low | Low | Low | None | None | Low | None | High |
Drift | No | No | Weak | Yes | No | No | No | No | No |
Integration density | High | Very high | High | High | High | Low | High | High | Low |
Retention | Long | Long | Medium | Long | Medium | Long | Long | Medium | ⋯ |
Endurance | 105 | 104 | 105–108 | 106–109 | 1015 | 1010 | >105 | >1015 | >105 |
Suitability for DNN training | No | No | No | No | No | No | Moderate | No | Yes |
Suitability for DNN inference | Yes | Yes | Moderate | Yes | No | No | Yes | No | Yes |
Suitability for SNN applications | Yes | No | Yes | Yes | Moderate | Yes | Yes | Moderate | Moderate |
A. Resistive switching memory (RRAM)
Figure 3(a) schematically shows the resistive random-access memory (RRAM), consisting of a metal–insulator–metal (MIM) stack where the insulating layer serves as the active switching material. The bottom electrode (BE) typically consists of a relatively inert metal, such as Pt or TiN, while the top electrode (TE) is generally a more reactive metal, such as Ti or Ta.19–21 In most cases, the switching layer is made of a metal oxide22 although also other materials have been used, such as nitrides,23 ternary oxides,24 chalcogenides,25 or 2D materials.26,27 Organic materials have been also explored, taking advantage of the low switching energies, wide-range of tunability, and facile ion-migration.28–30 However, limitations in the writing speed, scaling, and reliability remain open challenges. The forming operation generates a conductive filament (CF) across the switching layer. The CF resistance is changed by electrically induced chemical redox reactions, where the set operation causes the transition to the low-resistance state (LRS), while the reset operation causes the transition to the high-resistance state (HRS). These transitions can occur either by operating the device under the same polarity in unipolar RRAM31 or by alternating polarities in bipolar RRAM.32 Uniform switching RRAM where the resistance can change without any forming operation has also been proposed.33
B. Phase change memory (PCM)
Figure 3(b) schematically shows the phase change memory (PCM), which is based on the ability of specific phase change materials to switch reversibly between the amorphous and the crystalline phases exhibiting different electrical resistivity.34–36 The phase change material typically consists of chalcogenides, such as Ge2Sb2Te537 where phase transition can be triggered by the applied voltage pulse via Joule heating. The PCM offers the ability to store intermediate states by modulating the crystalline fraction within the active material38 although the stability of the memory state is potentially affected by temperature-dependent retention, caused by the recrystallization of the amorphous region,39 and drift, caused by the structural relaxation of the amorphous structure.40 These issues can be handled by materials engineering to improve the high-temperature stability41 and device engineering to reduce the resistance drift.42 The PCM technology has also been demonstrated in relatively advanced technology nodes, such as 2843 and 18 nm.44 The very high maturity level of development and the higher endurance compared to other non-volatile memory devices45 make PCM an ideal candidate for in-memory computing.
C. Ferroelectric random-access memory (FeRAM)
Figure 3(c) schematically shows a ferroelectric random access memory (FeRAM) device based on the ability of a ferroelectric layer to display a remnant electric polarization after the application of voltage pulses.46 The most typical ferroelectric materials include perovskites with structure ABO3, where A and B are cations, e.g., BaTiO3 (BTO)47 and PbZrxTi1−xO3 (PZT).48 Most recently, FeRAM has seen a revival since ferroelectricity was reported in pure and doped hafnium oxides HfO2 with an orthorhombic structure.49 While being a CMOS-compatible oxide, HfO2 has a lower dielectric constant compared to perovskite materials, thus enabling the development of ferroelectric layers with a small thickness between 5 and 30 nm, which is suitable for memory device scaling and 3D integration.50,51 However, a topic of intense research remains the realization of ferroelectric layer thickness well below 10 nm with good uniformity.52 FeRAM is probed by measuring the displacement current during ferroelectric switching and thus is a destructive operation that is not always practical for in-memory computing applications. To solve this issue, the ferroelectric tunnel junction (FTJ) has been developed in which the ferroelectric polarization is reflected by the device resistance thanks to bilayer stack device engineering.53
D. Spin-transfer torque magnetic random access memory (STT-MRAM)
Figure 3(d) schematically shows the spin-transfer torque magnetic random access memory (STT-MRAM), consisting of a magnetic tunnel junction (MTJ) composed of a thin insulator sandwiched between two ferromagnetic (FM) layers. In one of the two FM layers, the ferromagnetic polarization is pinned by the presence of adjacent magnetic layers, such as a synthetic antiferromagnetic stack,54,55 thus acting as a reference for the polarization. The other layer is free and can change its polarization via electrical pulses. The free layer magnetization can thus be programmed by applying a current pulse directly across the MTJ via spin torque.56,57 Two STT-MRAM states can thus be obtained, namely, a parallel state with relatively low resistance and an antiparallel state with relatively high resistance for equal and opposite directions, respectively, of the magnetic polarization in the pinned and free layers. STT-MRAM features fast switching and good cycling endurance.58 On the other hand, the resistance window is generally quite limited (less than a factor 2) and multilevel operation is hard to achieve.59
E. Ferroelectric field-effect transistor (FeFET)
In addition to two-terminal FeRAM and FTJ, a three-terminal ferroelectric device has been proposed, namely, the ferroelectric field-effect transistor (FeFET) in Fig. 3(e). The FeFET consists of a field-effect transistor where the gate dielectric is a ferroelectric layer.60,61 The ferroelectric polarization thus affects the threshold voltage VT, which can be used as a monitor of the memory state, similar to a floating-gate memory. Contrary to FeRAM devices, the reading operation of the FeFET device is non-destructive, which is highly favorable for IMC. In addition, FeFET can be integrated in vertical 3D architectures62 and can display multilevel operation by multilayered stack engineering.63 An important challenge is the limited cycling endurance of FeFET, which is typically in the range of 105 cycles, too small for most of applications.
F. Spin–orbit transfer magnetic random access memory (SOT-MRAM)
Figure 3(f) schematically shows the spin–orbit torque magnetic random access memory (SOT-MRAM). Similar to the STT-MRAM device, SOT-MRAM consists of an MTJ structure deposited on top of a metallic line made of a heavy metal, such as Pt or Ta.64,65 To program the SOT-MRAM device, a current pulse is applied along the heavy metal line, causing a polarity-dependent accumulation of spin-polarized electrons, thus inducing the magnetization switching in the free layer.65 The read operation is conducted by probing the MTJ resistance, similar to STT-MRAM. The separation between programming and reading paths allows minimizing the MTJ degradation, thus improving the cycling endurance with respect to STT-MRAM devices. Recently, the integration of SOT-MRAM with the CMOS technology has been demonstrated.66 Similar to MTJ devices, STT-MRAM suffers from a relatively small resistance window and difficult multilevel operation. Another potential issue is the need for an external magnetic field to support the free-layer switching, which can be overcome by advanced structures with built-in magnetic fields.67
G. Electrochemical random-access memory (ECRAM)
Figure 3(g) schematically shows the electro-chemical random access memory (ECRAM), where the conductivity of a metal-oxide transistor channel can be changed by ionized defects injection across the vertical stack, consisting of a reservoir layer and a solid-state electrolyte layer.68–70 Defects might consist of oxygen vacancies,71 Li ions,72 or protons.73 Organic materials have also been explored,74,75 demonstrating various synaptic and neuronal functionalities. Similar to SOT-MRAM, the three-terminal ECRAM structure allows decoupling the read and write paths, thus improving cycling endurance and reducing energy consumption thanks to the extremely low conductivity of the metal oxide channel, e.g., WO3.69 Controllable and linear potentiation characteristics were reported, which makes ECRAM a promising technology for synaptic devices in neuromorphic devices capable of learning and training.70 3D vertical ECRAM has also been demonstrated,76 paving the way for ECRAM-based high-density cross-point arrays.
H. Memtransistor
Memtransistor devices combine the three-terminal transistor structure with the memristor-like ability to change the channel conductance by the application of an in-plane drain–source voltage.77–79 Typical memtransistors consist of a FET with a 2D semiconductor channel, such as MoS2. The memory behavior is obtained by applying large source–drain voltages, which can induce the resistance change by various physical mechanisms, such as field-induced dislocation migration in the polycrystalline MoS2 channel,77,78 the dynamic tuning of the Schottky barrier at the metal–semiconductor contact,80 or the direct cation migration from the electrodes on the surface of a 2D semiconductor.79,81 Other implementations of memtransistors exploit the optical properties of the 2D material (typically, a transition metal dichalcogenide) to develop devices with neural properties.82,83 Similar neuromorphic devices were obtained exploiting the ionic diffusion on amorphous oxides, such as ZnO or indium tungsten oxide (IWO).84–86 The major advantage of the memtransistor is the three-terminal structure, the atomically thin channel, and the 3D integration in the back end. However, compared to all the other reported technologies, memtransistors are still in their early stage of development, with significant challenges on materials, device structures, and reliability.
III. IN-MEMORY COMPUTING
IMC development has achieved significant progress in the last 10 years, ranging from novel theoretical approaches to experimental IMC hardware demonstrations in silicon-verified test vehicles. The range of applications where IMC can offer improved energy efficiency, performance, and scaling opportunities can be divided into the two macro-categories of static and dynamic IMC, as shown in Fig. 4(a).
Static IMC, schematically shown in Fig. 4(b), consists of a physical computing concept where the emerging memories are used to store data and perform computation without changing or updating their programmed state.6 Generally, memory devices in static IMC are first programmed to a desired state to encode pre-trained computing parameters in the form of conductance levels. Random states can also be used in some applications, such as the physical unclonable function (PUF)87 and reservoir computing (RC) where the stochastic conductance resulting from the fabrication process is directly used in the computation.88 The programmed memory arrays are then used as physical matrices to execute in situ vectorial operations with high parallelism, such as matrix-vector multiplication (MVM).89 Low voltages are applied to prevent any perturbation to the conductive states during computation,90 thus resulting in a low power consumption, which is attractive for decentralized computing architectures, such as edge91 and fog92 computing. The high degree of parallelism allows reducing the number of operations needed to carry out a given task, thus achieving computational complexity.93,94 Examples of static IMC include matrix-vector-multiplication (MVM, Sec. IV), inverse-matrix-vector multiplication (IMVM, Sec. V), and content-addressable memories (CAMs, Sec. VI).
Dynamic IMC, schematically shown in Fig. 4(c), generally combines all the opportunities of static IMC with the additional strength of enabling controlled switching of the memory devices to reproduce additional functions, such as neuron activation,95 stateful Boolean logic,96,97 and learning in supervised/unsupervised neural networks.98–101 A wide range of physical mechanisms can be used for the controlled switching, such as filament plasticity in RRAM devices,102 gradual crystallization in PCM devices,95 charge trapping in MoS2 memtransistors,103 and magnetic polarization for true-random number generation (TRNG).104 Dynamic IMC provides a promising avenue for reducing latency, energy, and circuit area by leveraging the intrinsic device physics of the device instead of emulating the desired characteristics via the analog/digital design of CMOS-based networks.105 Dynamic and static IMC are generally combined in the same platform to provide energy-efficient computing systems capable of learning and adaptation.95,106 Applications of dynamic IMC include outer product accelerators for neural network training (Sec. VII) and neuromorphic systems for brain-inspired computing (Sec. VIII).
IV. MATRIX-VECTOR MULTIPLICATION
A. Concepts and implementation
The MVM operation of Fig. 5(a) is carried out without moving the matrix parameters, in line with in situ processing paradigm of IMC. In addition, the operation is performed in just one step, thus minimizing the latency and maximizing the throughput thanks to a computational complexity of O(1). Such a massive parallelism of MVM allows for achieving outstanding area and energy efficiency, compared to traditional digital multiply-and-accumulate (MAC) operations. Finally, the crosspoint array is generally integrated in the back end of the line (BEOL) of the CMOS process, thus taking advantage of 3D stacking and of a small cell area of only 4F2/N, where F is the lithographic feature size and N is the number stacked layers.110 Despite the advantages of parallelism, density, and latency, the MVM concept is an analog computing process that is critically sensitive to device variability,111,112 noise,113 drift of conductance,40 and parasitic IR drop along wires,114 all affecting the accuracy of computation. To deal with these parasitic effects, several mitigation and compensation techniques have been proposed at device,115 algorithm,114,116–120 and architectural levels.121,122
The MVM concept can be extended to virtually all types of memory devices and cell structures in the array. The one-resistance (1R) structure of Fig. 5(a) is affected by crosstalk and sneak path issues during programming and reading.123 These issues can be prevented by adding a selector device in series to the memory element, resulting in the one-selector/one-resistor (1S1R) structure124–126 or the one-transistor/one-resistor (1T1R) structure,127–129 illustrated in Figs. 5(b) and 5(c), respectively. The 1S1R configuration avoids sneak path currents during the programming phase by introducing a highly non-linear two-terminal device109,130,131 that suppresses the current of unselected and half-selected cells in the array while maintaining the small 4F2 area of the 1R cell structure.109 The 1T1R structure ensures tight control of the programming current while allowing sophisticated program/verify algorithms132 at the cost of a larger cell area and a higher complexity introduced by the third terminal. In addition to resistive memory cells, where the computation parameter is stored in the conductance, capacitive memories can be adopted with the one capacitance (1C) structure in Fig. 5(d). Here, the small-signal capacitance can be tuned133 and used in MVM operations via the charge-voltage capacitor law Q = CV.
Note that, while MVM is strongly accelerated thanks to the array parallelism, memory programming might require a relatively long time, especially when a high equivalent-bit precision is needed. However, the programming time can be generally amortized for applications where the computational parameters remain fixed for most of the MVM operations. This is the case for discrete cosine transform (DCT) for extracting frequency components from a data sequence.137 DCT is routinely applied for image compression, thus providing an ideal application for IMC.134
B. Application to neural network inference
Another application where computational parameters remain constant throughout computation is the forward propagation during the inference phase in a deep neural network (DNN).138,139 Figure 6(a) shows a sketch of a fully connected neural network (FCNN) for image classification with three synaptic layers. Each synaptic layer can be viewed as a MVM where synaptic weights are mapped in the conductance matrix, while activations are used as the input vector. The inference operation can thus be mapped in several MVMs occurring in distinct crosspoint arrays, each mapping a different synaptic layer or a region of the DNN. Figure 6(b) shows a possible multi-core IMC architecture where each computational unit performs the assigned computation independently, as illustrated in Fig. 6(c), while a logic unit collects output data from the cores and submits activation signals to them. Given the sequential operation of DNN inference, the architecture and computational cores can be optimized to maximize the data throughput.
Inference accelerators have been proposed with a variety of implementations, differing by the adopted memory technologies;98,127,140 the number of quantized levels of input, weight, and output;141,142 the peripheral circuits;136,143 the amount of possible reconfiguration;143 and the possibility of implementing backpropagation training in addition to forward-propagation inference.99,144 Similar to FCNN layers, IMC has been shown to accelerate convolutional layers99,127 and recurrent neural networks145 by changing the MVM partition and computation technique.146
IMC can largely improve the energy efficiency and the throughput of MVM for DNN inference. Figure 6(d) shows the power efficiency and throughput of the state-of-the-art IMC accelerators based on nonvolatile memories compared to IMC based on static random access memory (SRAM) or fully digital accelerators.147 SRAMs feature faster access time and better robustness to variability and disturbs thanks to their digital nature and fully silicon-based CMOS technology. However, SRAM has a larger cell area due to the 6T or 8T bit-cell structure, cannot implement multilevel operations, and cannot provide nonvolatile storage, thus requiring the upload of computational parameters at the power-on phase. The latter issue is a significant drawback in applications where the neural accelerator frequently switches between stand-by and computing phase, which is typical in low-power edge-computing applications.
C. Application to combinatorial optimization
MVM represents the core operation of combinatorial optimization tasks.148 Here, emerging memories can provide both the MVM operation via the crosspoint array and the stochastic physical noise, which is generally needed to navigate among the local minima of the cost function. Indeed, metaheuristic optimization techniques, such as chaotic simulated annealing or stochastic simulated annealing, require massive MVM and tunable sources of noise. These computing strategies typically rely on recurrent stochastic networks, such as the Hopfield neural network, sketched in Fig. 7(a),95,106,149 or restricted Boltzmann machine (RBM).150–152 In these approaches, the network is characterized by a certain energy (or cost) function E that depends on the state of the neurons, which in turn depends on the synaptic spike stimulations and the injected noise. By properly tuning the injected noise, it is possible to control the ability of the neurons to escape from local minima of E, as depicted in Fig. 7(b). By gradually decreasing the injected noise, the search takes the shape of a simulated annealing algorithm, where the effective temperature is slowly decreased in analogy with the cooling phase of physical annealing. This is shown in Fig. 7(c), where the network manages to find thermal equilibrium at the global minimum of E, thus solving the optimization task.145 This approach finds application in several key workloads in logistics, scheduling, and other NP-hard problems, such as the traveling salesperson problem.
D. Application to stochastic computing and security
Programming variability is a major issue in deterministic DNNs by affecting the weight precision, hence the accuracy of inference. On the other hand, programming variation can provide a source of stochasticity for specific computing applications, such as stochastic computing and hardware security. For instance, Bayesian inference relies on neural networks where the model parameters are probability distributions. In this scenario, transferring the ex-situ trained model to the hardware network is less critical since a probability distribution can be naturally modeled by the physical distribution of conductance states.153, Figure 8(a) shows the conceptual scheme of an RRAM-based Bayesian network where each synaptic weight belongs to a certain distribution. Figure 8(b) shows a possible implementation in an N x M array of RRAM synapses with 1T1R structures.153 Here, the distribution of a synaptic parameter is modeled by the distribution of conductance states of N devices in a column, while the input voltages to each column are the outputs generated by M neurons in the previous layer. By applying a voltage vector across M columns, each row yields a current that flows into a neuron circuit, resulting in a distribution of N neuron activation voltages, namely, the output distribution of the neuron. Based on the same approach, Monte Carlo Markov chain (MCMC) networks have been demonstrated with stochastic RRAM arrays.154
The stochastic properties of emerging memories can also provide the foundation for developing novel security primitive circuits.104, Figure 8(c) shows the conceptual idea for implementing a memory-based physical unclonable function (PUF) for chip authentication.87 An input challenge encodes the information to select specific rows and columns of the crosspoint memory array, thus generating a single-bit unique response by current comparison. A 1R crosspoint array is adopted to take advantage of circulating sneak path currents, enabling the participation and interaction of all memory devices in the array, thus increasing the complexity of the solution and robustness to external attacks.87
V. INVERSE MATRIX-VECTOR MULTIPLICATION
Figure 10(a) shows the experimental output of a hardware implementation of the circuit in Fig. 9(a) to yield the elements of a 3 × 3 inverse matrix A−1 as a function of the analytical solution.155 In-memory matrix inversion might find application in a number of machine learning tasks, such as Markov chain159 and numerical solution of differential equations.155 With errors as low as 3%, feedback-based crossbar circuits can provide a viable alternative to bulky digital processors for linear system solution tasks, serving as a potential cornerstone of IMC-based analog processing units.
A. Application to ranking algorithms
B. Application to data regression
C. Discussion
CL-IMC allows for the acceleration of several IMVM operations with reduced complexity, which is attractive for large-scale general-purpose machine learning accelerators. On the other hand, CL-IMC also faces considerable challenges, such as the reduced precision with respect to a floating-point computers, owing to the increased sensitivity of the analog domain.159 Circuit non-ideality affecting the computing accuracy includes the parasitic interconnect resistances,114 electronic noise from circuit components,90 and conductance variations.158 The effect of non-ideality can be mitigated by compensation schemes, array tiling, signal range increase, and fine-tuned programming algorithms, thus resulting in a complex trade-off with the overall throughput, area, and energy consumption.90,122,162 On the other hand, error-tolerant applications, such as massive multiple-input/multiple-output (MIMO) decoding in 6G networks, allow for better robustness to circuit non-ideality.163 Finally, the medium-precision solution obtained by analog IMC might be used as a seed for high-precision digital solvers,164 allowing for orders-of-magnitude improvements in energy consumption and execution time.
VI. COMPUTING WITH CONTENT ADDRESSABLE MEMORY
The content-addressable memory (CAM) is a specialized memory structure where stored data are accessed by inputting the desired data content and extracting their address as the output, which is the opposite compared to conventional memories.165, Figure 11(a) shows a schematic structure of a typical ternary content addressable memory (TCAM), where the third option don’t care or “X” is available in addition to binary 0 and 1 values in the memory array. Here, an input pattern presented to the CAM from data lines (DLs) is compared with the stored data and the corresponding match line (ML) is asserted if a match is found. Due to its inherently high parallelism, CAM/TCAM is naturally suited to accelerate pattern matching,166,167 branch prediction,168 and lookup operations169 in situ within the memory, thus minimizing data movement.
TCAM parallelism comes at the expense of relatively large area and power consumption as every memory cell must be equipped with a dedicated comparison circuit. When implemented using SRAM memories, a single CAM cell may use up to 16 transistors,165 thus adding significant area, latency, and power overhead for the search operation and preventing large-scale integration. By replacing conventional SRAM with emerging memories, leakage power can be reduced and cell density can be improved. Figure 11(b) shows a differential RRAM-based CAM cell, where memory devices M1 and M2 are programmed to either state LRS/HRS or HRS/LRS to reproduce values “1” or “0,” respectively.167 State “X” is instead obtained by programming both RRAM devices to either HRS or LRS. Depending on the relative ratio of the two conductances (stored data) and the voltage at the wordline (WL) (input data), the match-line ML is either asserted low or left high, thus realizing CAM operation. RRAM-based TCAMs were shown to accelerate regular expression matching and genomic sequencing with up to 25× improvement in energy efficiency.167
The analog tunability of emerging memories allows for realizing analog CAMs capable of analog pattern matching with stored data. Figure 11(c) shows an analog CAM cell170 where value intervals, rather than binary values, can be stored and compared with analog input patterns. In this case, the match line is asserted when all values of the input pattern fall within the ranges stored in the corresponding row of the memory array. Analog memory-based CAMs are naturally suited to accelerate more-than-binary tree-based algorithms, which represent the foundation of many machine learning tasks. Figure 11(d) shows a proposed implementation171 of tree-based inference applied to the classification of the Iris dataset. By mapping each root-to-leaf path into a corresponding row of the memory array, input data can be instantly classified by coupling the analog CAM to a label array, as shown in Fig. 11(e), with a ×103 throughput improvement with respect to digital implementations.171
VII. ONLINE TRAINING BY IN-MEMORY OUTER PRODUCT
The key requirement for the in-memory outer product of Fig. 12(b) is the linearity of the conductance change with both pulse voltage and time or at least one of the two. Conductance update can be physically obtained by potentiation or depression of the memory conductance by applying suitable pulses to the devices. The linear update must be obtained by an open-loop operation, where the same conductance change is achieved at a given voltage and pulse-width, irrespective of the initial state. Unfortunately, potentiation and depression of emerging memories are generally non-linear with applied voltage as a result of the exponential time–voltage relationship of ion migration, tunneling, and other fundamental physical processes of set/reset.128
Figure 13(d) summarizes the metrics for synaptic memory devices, reporting the normalized conductance window (Gmax − Gmin)/Gmin, describing the full-scale range of the synaptic weight as a function of the shape factor ν, and describing linearity for various synaptic devices.15,69,175–179 Among all the memory technologies, the CTM device combines excellent linearity of the weight update curve with a large conductance window. Note that the CTM device has a unidirectional characteristic, i.e., depression is spontaneous and generally non-linear. However, this limitation is mitigated by a differential synapse scheme where two CTM devices are combined in the same synapse to map positive and negative weights.18 CTM also offers extremely low conductance thanks to the sub-threshold operation, which is useful to suppress the IR drop and enable the training of large synaptic arrays. MoS2 also displays excellent scaling properties thanks to the atomically thin 2D semiconductor and the capability of 3D integration, thus providing a promising avenue for high-density 3D crosspoint arrays for training accelerators.180
VIII. NEUROMORPHIC COMPUTING
Neuromorphic engineering aims at developing computing systems by using design principles that are based on those of the biological nervous systems.105,181 By mimicking the human brain, the objective is to achieve a high energy efficiency, large parallelism, and the capacity to solve cognitive tasks, such as object recognition, association, adaptation, and learning.18 Most importantly, the brain provides a blueprint for non-von Neumann computation, where information and memory are co-located in the same neurobiological network.182 The neuromorphic term and concept were originally introduced in the early 1990s181 and later revived in the early 2000s,183 when the fast growth of online generated data started to spur the investigation of alternative computing paradigms. Recently, the neuromorphic engineering topic has seen a new wave of research interest in view of the added potential to embrace emerging memories as an enabling technology to implement brain-inspired processes.184–186
Figure 14 shows a summary of the main neurobiological features that can be implemented in a neuromorphic system, including synapses and neurons, the latter composed of a soma, an axon, and several dendrites.187,188 Information is exchanged among neurons in the form of temporal spikes, which are weighted by synaptic connections and collected by the neuron soma. Synapses display synaptic plasticity, where the synaptic weight is changed upon spiking stimulation. Both long-term plasticity189,190 and short-term plasticity191 have been evidenced by experiments. Over the years, several plasticity rules have been proposed, including paired-pulse facilitation (PPF),192,193 spike-timing dependent plasticity (STDP),191,194–196 triplet-based plasticity,197,198 and spike-rate dependent plasticity (SRDP).199,200 The hardware implementation of each element in Fig. 14 in CMOS technology generally requires complicated transistor-based circuits and large-area capacitors to match the dynamic temporal evolution of the brain processes. From this standpoint, emerging memories offer a technology platform for providing nonvolatile synaptic weights capable of short- and long-term plasticity, increasing the area density of synapses and featuring unique dynamic properties with neuro-plausible time constant by the physical device mechanism.187,188 For instance, synaptic long-term plasticity by STDP has been demonstrated in both PCM201,202 and RRAM.177,203–205 Learning was shown to occur both by properly overlapping the pre- and post-synaptic spikes across the memory element205,206 or by the physical interaction between thermal and electrical stimulations in the so-called second-order memristors.207 Figures 15(a) and 15(b) show the 1 T1R synapse circuit with the typical pulses applied to the gate and TE. This circuit demonstrated both the synaptic weight update according to STDP and the communication between the PRE- and POST-neurons. Figure 15(c) shows instead the programming pulses and pre/post-spikes for STDP in a Ta2O5−x/TaOy second-order memristor. By applying the pre- and post-spikes at the TE and BE, the interaction between the applied electric field and the local temperature leads to a Δt-dependent conductance change. Multisynaptic circuits with 1T1R RRAM devices capable of STDP were shown to display unsupervised learning,101,208 which is extremely promising for the development of the perceptron-like network capable of autonomous learning and adaptation [Figs. 15(d) and 15(e)].
A. Brain-inspired computing with volatile memories
Volatile memory devices, while lacking a clear application in digital systems due to insufficient retention, provide an ideal technology for reproducing short-term memory (STM) behavior in neuromorphic systems.193 Volatile switching can be displayed in a class of filamentary RRAM devices where Ag or Cu are used as TE materials20,209 or dispersed in the switching layer.210, Figure 16(a) shows the typical I–V characteristics of a volatile RRAM device based on Ag nanodots.211 The volatile behavior is generally attributed to the filamentary switching and spontaneous rediffusion of Ag atoms to minimize the total energy of the filament.209 Volatile RRAMs were initially proposed as selector elements in crosspoint memory arrays thanks to their large on/off ratio and low leakage current.212–214 Later, these devices attracted interest from the neuromorphic community in view of their relatively long retention time similar to the biological time constants for STM.193,215 For instance, Fig. 16(b) shows a typical pulsed characteristic of an Ag-based RRAM, stimulated by a triangular pulse. After the pulse, the current persists for a retention time of about 150 µs, revealing the time decay of the filamentary path within the active material. Volatile switching of RRAM devices can be used as the fire function in an integrate-and-fire neuron circuit, thus avoiding the use of area-consuming amplifiers and pulse generators.216 Volatile RRAMs have also been used for replicating PPF induced by paired spikes, where the pulsed-induced potentiation of the synaptic weight is enhanced by the application of two identical stimuli.217,218 Most importantly, the dynamic STM effect can be useful to mimic sensing, learning, and processing of spatiotemporal patterns, such as audio and video sequences.
Figure 16(c) shows an example of spatiotemporal pattern recognition via volatile RRAM.219 Two volatile synapses, serving as excitatory and inhibitory synapses, respectively, are stimulated by spikes A and B. Each synapse consists of several Ag-based volatile RRAM devices, where the spike stimulation and the persistent current cause an overall exponentially decaying response of each synapse as a result of Kirchhoff’s law summation of each RRAM current contribution. The excitatory current Iexc and the inhibitory current Iinh are subtracted from each other to yield the excitatory postsynaptic current (EPSC) given by IEPSC = Iexc − Iinh. Figures 16(d) and 16(e) show the synaptic currents and the EPSC for the case of the preferred sequence, namely, A–B, and the non-preferred sequence, namely, B-A. Due to the delay between the synaptic currents, the preferred sequence yields a positive EPSC, while the non-preferred sequence yields a negative EPSC. By comparing the EPSC with a threshold current, e.g., Ith = 2.5 μA in Figs. 16(d) and 16(e) allows us to easily discriminate between the two patterns. This concept was applied to realize a retina-inspired artificial vision system capable of motion detection. In the biological retina, motion detection is achieved by direction-selective (DS) ganglion cells,220 where excitatory and inhibitory synapses occupy adjacent areas within the receptive field [Fig. 16(c)]. An image moving across the ganglion cell might stimulate the excitatory synapses followed by the inhibitory synapses, or vice versa, depending on the direction [Figs. 16(d) and 16(e)]. The EPSC of the ganglion cell thus allows us to recognize the direction of the image. The same concept can be extended to multiple directions by mimicking the starburst amacrine cell (SAC) structure in the retina, thus enabling a fast, low-power direction sensitivity in the analog domain.219,221
B. Reservoir computing with volatile memories
Reservoir computing (RC) is a modern machine learning technique, which is particularly suited to temporal/sequential information processing.222, Figure 17(a) schematically shows the RC concept, which was originally conceived as an alternative approach to recurrent neural network (RNN) design and training, such as liquid state machines223 and echo state networks.224 In general, an RC network transforms sequential input data into a high-dimensional dynamical state via a reservoir layer. The output of the reservoir network is then processed by a readout layer to provide recognition and classification. The reservoir layer generally features random weights and connections, thus limiting the need for training to the readout layer and overcoming the complexity of multi-layer gradient-descent training techniques. Hardware RC networks are attracting interest thanks to their potential in energy efficiency, high versatility, and fast learning.225–227
Figure 17(b) schematically shows an IMC-based RC network for image recognition.228 First, the input pattern, e.g., the image of a handwritten digit, is converted into a spatiotemporal pattern, where rows represent the sequential spikes and columns represent the N input channels. The resulting spatiotemporal is fed to N volatile RRAM devices where the STM response provides a physical reservoir layer. The dynamic reservoir layer yields a unique output response, e.g., the output transient current, to each input pattern, which can then be classified by the readout layer, consisting of a properly trained fully connected network.
RC was demonstrated by using charge-trap memory (CTM) devices based on a MoS2-based channel.103, Figure 17(c) shows the device structure with source/drain contacts deposited on a MoS2 channel, where inversion and depletion were controlled by a back gate. In this device, a positive or negative gate voltage results in the trapping of electrons or holes, respectively, at the interface between MoS2 and SiO2, the latter serving as gate dielectric layer. Electron/hole trappings cause a shift of threshold voltage, thus resulting in a change in the channel conductivity. This is shown in Fig. 17(d), where a train of negative gate pulses leads to an increase in conductance, which spontaneously decays at the end of the stimulation. The dynamic response in Fig. 17(d) was used as a physical reservoir process in an RC network for image recognition with 5 CTM devices as the reservoir layer.103, Figure 17(e) shows examples of the reservoir output, indicating potentiation and spontaneous decay as a result of the spatiotemporal stimulation. After training the readout layer by the logistic regression,157 a good classification accuracy was achieved, as shown by the confusion diagram in Fig. 17(f). Compared to DNNs, RC networks employ fewer devices by leveraging the rich analog, dynamic response of the CTM device, thus resulting in a significantly smaller classification network.229 In addition, power consumption can be minimized in the RC layer by operating the CTM device in the subthreshold regime.103 Similar spatiotemporal RC networks were used for solving second-order nonlinear equations,228 spoken-digit recognition,229 and autonomous chaotic time-series forecasting,229 thus supporting the wide application scenario for RC-based IMC circuits.
C. In-materia computing
The principle of using device physics to achieve smart computing functions is further extended from devices to materials in the so-called in-materia computing.230,231 In-materia computing relies on the ability of certain materials, such nanoparticles, nanostructures, or even randomly-doped semiconductors, to act as a distributed, random network of physical dynamical nodes for computation.232 In-materia computing systems include nanostructures based on carbon nanotubes (CNTs),233,234 nanowires (NWs),235–237 and metallic nanoparticles.238 Indeed, programming, stimulating, and controlling the individual nodes in the computing materials are a challenging task since the materials can exhibit dynamic fluctuations.239,240 However, nanostructures are ideally suited to serve as the randomly connected reservoir layer of an RC network.225,236 Figure 18(a) shows a fully memristive RC system where the RC layer is made of a network of silver nanowires (NWs), which is shown in Fig. 18(b).236 The electrical stimulation of the NW network induces a change in the NW cross-point junctions,235 thus resulting in a dynamic potentiation of the local connection, hence the local effective conductance. The output of the reservoir, i.e., the output current or the node potential of the NW network, is then processed by the readout layer, e.g., a fully connected network of RRAM devices. By properly training the readout network, tasks such as image recognition and spatiotemporal pattern prediction can be carried out.236 This approach to computation has distinct advantages in terms of scaling and easy manufacturing thanks to the bottom-up technology for developing the physical NW network. Figure 18(c) shows a neuromorphic device composed of a single-walled carbon nanotube (SWCNT) complexed with polyoxometalate (POM).234,241 When arranged in a network, SWCNT can spontaneously generate spikes and noise thanks to multi-redox activities at the crossing points.242 Both periodic and aperiodic current spikes are generated under a constant-voltage bias, as shown in Fig. 18(d). The applied bias causes the conductance to switch between POMs and SWCNTs, thus mimicking the potentiation behavior of a neurobiological synapse. Chemical reaction phenomena, such as aggregation and dissociation of counter-cations, play an additional role, thus leading to spike generation. Similar to the NW network of Fig. 18(b), the POM/SWCNT network can serve as a reservoir layer in an RC system thanks to its nonlinear dynamic.234
SWCNT networks were also used as analog synapses in the neuromorphic module of Fig. 18(e).233 The module consists of a single neuron connected with other neurons through synapses. The synapses are emulated by transistors based on a random CNT network, while the axon in the neuron is realized by Si-based transistors. Figure 18(f) shows the CNTs-based synaptic transistor, with the random SWCNT network in the inset. Electron trapping in the dielectric layer due to the application of gate pulses results in an increase of current in the p-type SWCNT channel. Potentiation is followed by decay due to the tunneling of electrons out from the dielectric layer. The SWCNT-based synapse also shows inhibitory characteristics under the negative voltage of the gate. Potentiation/depression allows for the emulation of biological STDP and PPF, which is promising for the development of in-materia neuromorphic computing systems.
IX. OUTLOOK
The main enablers of IMC are emerging memory devices, whose distinct advantages, such as nonvolatile behavior, make them more appealing than SRAM243 or DRAM244 although at the expense of increased programming energy and times.245,246 For tasks where computational parameters must be frequently updated, such as stateful Boolean logic circuits,96,97 the programming overhead may overshadow the advantages of IMC. Moreover, given the fundamentally different characteristics of emerging memories in terms of linearity, power consumption, conductance window, noise, and CMOS compatibility,245,247,248 it is difficult to identify a best-in-class technology with universal applicability across all IMC applications.100,134,170,249–253 As an example, combinatorial optimization tasks254–256 inherently require controllable, device-level randomness148 as an enabling feature for simulated annealing.106 On the other hand, scientific computing applications show extremely narrow tolerance to perturbation and noise,257 relying on high-precision data storage to provide high-quality results.249 The search for a universal memory, capable of satisfying the requirements of many applications at the same time, is thus still open. One of the main pathways for the implementation of in-memory computing is the reduction of the power consumption of memory devices to allow for the operation of extremely large arrays with an affordable cost. Another key challenge is the improvement of reliability, e.g., the realization of self-selecting, multilevel memory devices with a large endurance and low variability. At the present time, these requirements can be partially solved by proper programming approaches (program and verify algorithms) or device implementation (1T1R structures, etc.) at the cost of operation slowness and decrease of integration density.
Many of the advantages of IMC derive from the collective behavior of densely packed memory cells in an array configuration. Common parasitics, such as line resistance and capacitance,258 can limit the accuracy of both write and read operations, thus affecting the reliability of IMC.114,259 While selector devices alleviate the issue during the programming phase, they have limited impact during computation as all cells are simultaneously selected. Schemes for parasitic compensation164,260,261 may help mitigate the issue at the expense of increased pre-processing overhead and reduced effectiveness for large array size. For error-tolerant or adaptive applications, optimization frameworks can be developed262,263 with negligible loss of accuracy. Another approach is to use three-terminal devices with ultra-low conductance, such as ECRAM and MoS2 CTM devices,175 to minimize both the IR drop and the line capacitances of the array. However, large-scale crosspoint arrays of two-terminal devices have been exhaustively demonstrated in academia and industry,264–268 whereas the same maturity level is currently lacking for arrays of three-terminal emerging memory devices.76,80
Power consumption is another key consideration imposing constraints on the individual array size.122,269,270 Power can be handled by arranging the IMC system with tiled architecture7 where multiple replicas of a fundamental computing macro, or core, work in parallel for the execution of a computing task. Core architecture design is another open quest in the field of IMC, where computational efficiency and robustness must be balanced with analog-to-digital and digital-to-analog conversion overheads.248 On the one hand, IMC-specific conversion front-ends271,272 should balance accuracy, latency, energy, and area consumption. On the other hand, various approaches to data encoding, such as amplitude modulation134,273 or pulse-width modulation,136 require conversion circuits to be flexible and reconfigurable. Finally, proper design of the inter-core communication is crucial to maintain the IMC advantage and allow for the solution of large-scale problems.274 Co-optimization of the device, architecture, and application seems to be the most promising concept to fully unleash the IMC potential in overcoming the von Neumann bottleneck.269,275
Finally, to allow for widespread IMC adoption, it is essential to bridge the gap between hardware and software by implementing an electronic design automation (EDA) toolchain. On the one hand, IMC-specific design tools276 are useful for system designers and engineers to develop large-scale, highly accurate IMC hardware and software systems. On the other hand, end users operating at a higher level of abstraction need a software stack capable of transparently compiling a given problem for a target IMC architecture optimization.277–279 This challenge should be tackled by the codesign and co-development of a full set of hardware and software tools to elevate the maturity of IMC for real-life applications.
X. CONCLUSIONS
This Perspective provides a review of the status and outlook of IMC with emerging memory devices. The candidate alternatives to the conventional von Neumann architecture are presented and compared in terms of their degree of integration between memory and computing units. Two-terminal and three-terminal emerging memory devices are reviewed. By distinguishing two general operating regimes of emerging devices, low-voltage static IMC and high-voltage dynamic IMC are identified as the main IMC macro-categories. Correspondingly, the most relevant computing primitives are explored in view of their real-world applications. For static IMC, MVM and IMVM accelerators, as well as TCAMs, are presented together with their applications in machine learning, hardware security, and data classification. Similarly, for dynamic IMC, outer-product accelerators for neural network training and brain-inspired systems for reservoir computing are discussed. Finally, challenges for the in silico implementation of an IMC architecture are outlined. Owing to the overarching nature of IMC, encompassing device, computing core, and the EDA toolchain, a strongly multidisciplinary approach is needed to co-optimize all components and fully unleash the IMC potential.
ACKNOWLEDGMENTS
This work received funding from the Italian Ministry of University and Research (MUR) and the European Union (EU) under the PON/REACT program and the Horizon 2020 Research and Innovation program (Grant Agreement Nos. 824164 and 899559). This work also received funding from ECSEL Joint Undertaking (JU) under Grant Agreement No. 101007321. The JU receives support from the European Union’s Horizon 2020 Research and Innovation program and France, Belgium, Czech Republic, Germany, Italy, Sweden, Switzerland, and Turkey.
AUTHOR DECLARATIONS
Conflict of Interest
The authors have no conflicts to disclose.
Author Contributions
P.M. and M.F. contributed equally to this work.
P. Mannocci: Writing – original draft (lead). M. Farronato: Writing – original draft (lead). N. Lepri: Writing – original draft (equal). L. Cattaneo: Writing – original draft (equal). A. Glukhov: Writing – original draft (equal). Z. Sun: Writing – original draft (equal). D. Ielmini: Conceptualization (lead); Funding acquisition (lead); Project administration (lead); Resources (lead); Supervision (lead); Writing – review & editing (lead).
DATA AVAILABILITY
The data that support the findings of this study are openly available in https://zenodo.org/record/7378087#.Y4Y8xHbMKCo.