Artificial neural networks (ANNs) have emerged as an essential tool in machine learning, achieving remarkable success across diverse domains, including image and speech generation, game playing, and robotics. However, there exist fundamental differences between ANNs’ operating mechanisms and those of the biological brain, particularly concerning learning processes. This paper presents a comprehensive review of current brain-inspired learning representations in artificial neural networks. We investigate the integration of more biologically plausible mechanisms, such as synaptic plasticity, to improve these networks’ capabilities. Moreover, we delve into the potential advantages and challenges accompanying this approach. In this review, we pinpoint promising avenues for future research in this rapidly advancing field, which could bring us closer to understanding the essence of intelligence.

The dynamic interrelationship between memory and learning is a fundamental hallmark of intelligent biological systems. It empowers organisms to not only assimilate new knowledge but also to continuously refine their existing abilities, enabling them to adeptly respond to changing environmental conditions. This adaptive characteristic is relevant on various time scales, encompassing both long-term learning and rapid short-term learning via short-term plasticity mechanisms, highlighting the complexity and adaptability of biological neural systems.1–3 The development of artificial systems that draw high-level, hierarchical inspiration from the brain has been a long-standing scientific pursuit spanning several decades. While earlier attempts were met with limited success, the most recent generation of artificial intelligence (AI) algorithms has achieved significant breakthroughs in many challenging tasks. These tasks include, but are not limited to, the generation of images and text from human-provided prompts,4–7 the control of complex robotic systems,8–10 the mastery of strategy games such as Chess and Go,11 and a multimodal amalgamation of these.12 

While ANNs have made significant advancements in various fields, there are still major limitations in their ability to continuously learn and adapt such as biological brains.13–15 Unlike current models of machine intelligence, animals can learn throughout their entire lifespan, which is essential for stable adaptation to changing environments. This ability, known as lifelong learning, remains a significant challenge for artificial intelligence, which primarily optimizes problems consisting of fixed labeled datasets, causing it to struggle to generalize to new tasks or retain information across repeated learning iterations.14 Addressing this challenge is an active area of research, and the potential implications of developing AI with lifelong learning abilities could have far-reaching impacts across multiple domains.

In this paper, we offer a unique review that seeks to identify the mechanisms of learning in the brain that have inspired current artificial intelligence algorithms. The scope of this review is for algorithms that modify the parameters of a neural network, such as synaptic plasticity, and how they relate to the brain. To better understand the biological processes underlying natural intelligence, the first section will explore the low-level components that shape neuromodulation, from synaptic plasticity to the role of local and global dynamics that shape neural activity. This will be related back to ANNs in the third section, where we compare and contrast ANNs with biological neural systems. This will give us a logical basis that seeks to justify why the brain has more to offer AI beyond the inheritance of current artificial models. Following that, we will delve into algorithms for artificial learning that emulate these processes to improve the capabilities of AI systems. Finally, we will discuss various applications of these AI techniques in real-world scenarios, highlighting their potential impact on fields such as robotics, lifelong learning, and neuromorphic computing. By doing so, we aim to provide a comprehensive understanding of the interplay between learning mechanisms in the biological brain and artificial intelligence, highlighting the potential benefits that can arise from this synergistic relationship. We hope our findings will encourage a new generation of brain-inspired learning algorithms.

A grand effort in neuroscience aims at identifying the underlying processes of learning in the brain. Several mechanisms have been proposed to explain the biological basis of learning at varying levels of granularity—from the synapse to population-level activity. However, the vast majority of biologically plausible models of learning are characterized by plasticity that emerges from the interaction between local and global events.16 Below, we introduce various forms of plasticity and how these processes interact in more detail.

Synaptic plasticity: Plasticity in the brain refers to the capacity of experience to modify the function of neural circuits. The plasticity of synapses specifically refers to the modification of the strength of synaptic transmission based on activity and is currently the most widely investigated mechanism by which the brain adapts to new information.17,18 There are two broader classes of synaptic plasticity: short- and long-term plasticity. Short-term plasticity acts on the scale of tens of milliseconds to minutes and has an important role in short-term adaptation to sensory stimuli and short-lasting memory formation.19 Long-term plasticity acts on the scale of minutes to more and is thought to be one of the primary processes underlying long-term behavioral changes and memory storage.20 

Neuromodulation: In addition to the plasticity of synapses, another important mechanism by which the brain adapts to new information is through neuromodulation.3,21,22 Neuromodulation refers to the regulation of neural activity by chemical signaling molecules, often referred to as neurotransmitters or hormones. These signaling molecules can alter the excitability of neural circuits and the strength of synapses and can have both short- and long-term effects on neural function. Different types of neuromodulation have been identified, including acetylcholine, dopamine, and serotonin, which have been linked to various functions such as attention, learning, and emotion.23 Neuromodulation has been suggested to play a role in various forms of plasticity, including short-19 and long-term plasticity.22 

Metaplasticity: The ability of neurons to modify both their function and structure based on activity is what characterizes synaptic plasticity. These modifications that occur at the synapse must be precisely organized so that changes occur at the right time and in the right quantity. This regulation of plasticity is referred to as metaplasticity, or the “plasticity of synaptic plasticity,” and plays a vital role in safeguarding the constantly changing brain from its own saturation.24–26 Essentially, metaplasticity alters the ability of synapses to generate plasticity by inducing a change in the physiological state of neurons or synapses. Metaplasticity has been proposed as a fundamental mechanism in memory stability, learning, and regulating neural excitability. While similar, metaplasticity can be distinguished from neuromodulation, with metaplastic and neuromodulatory events often overlapping in time during the modification of a synapse.

Neurogenesis: The process by which newly formed neurons are integrated into existing neural circuits is referred to as neurogenesis. Neurogenesis is most active during embryonic development but is also known to occur throughout the adult lifetime, particularly in the subventricular zone of the lateral ventricles,27 the amygdala,28 and in the dentate gyrus of the hippocampal formation.29 In adult mice, neurogenesis has been demonstrated to increase when living in enriched environments vs in standard laboratory conditions.30 In addition, many environmental factors, such as exercise31,32 and stress,33,34 have been demonstrated to change the rate of neurogenesis in the rodent hippocampus. Overall, while the role of neurogenesis in learning is not fully understood, it is believed to play an important role in supporting learning in the brain.

Glial cells: Glial cells, or neuroglia, play a vital role in supporting learning and memory by modulating neurotransmitter signaling at synapses, the small gaps between neurons where neurotransmitters are released and received.35 Astrocytes, one type of glial cell, can release and reuptake neurotransmitters, as well as metabolize and detoxify them. This helps to regulate the balance and availability of neurotransmitters in the brain, which is essential for normal brain function and learning.36 Microglia, another type of glial cell, can also modulate neurotransmitter signaling and participate in the repair and regeneration of damaged tissue, which is important for learning and memory.37 In addition to repair and modulation, structural changes in synaptic strength require the involvement of different types of glial cells, with the most notable influence coming from astrocytes.36 However, despite their crucial involvement, we have yet to fully understand the role of glial cells. Understanding the mechanisms by which glial cells support learning at synapses is an important area of ongoing research.

Artificial neural networks have played a vital role in machine learning over the past several decades. These networks have seen tremendous progress toward solving a variety of challenging problems. Many of the most impressive accomplishments in AI have been realized through the use of large ANNs trained on tremendous amounts of data. While there have been many technical advancements, many of the accomplishments in AI can be explained by innovations in computing technology, such as large-scale GPU accelerators and the accessibility of data. While the application of large-scale ANNs has led to major innovations, there are still many challenges ahead. A few of the most pressing practical limitations of ANNs are that they are not efficient in terms of power consumption and they are not very good at processing dynamic and noisy data. In addition, ANNs are not able to learn beyond their training period (e.g., during deployment), from which data assumes an independent and identically distributed (IID) form without time, which does not reflect physical reality where information is highly temporally and spatially correlated. These limitations have led to their application requiring vast amounts of energy when deployed in large-scale settings38 and have also presented challenges toward integration into edge computing devices, such as robotics and wearable devices.39 

Looking toward neuroscience at a solution, researchers have been exploring spiking neural networks (SNNs) as an alternative to ANNs40 (Fig. 1). SNNs are a class of ANNs that are designed to more closely resemble the behavior of biological neurons. The primary difference between ANNs and SNNs is the idea that SNNs incorporate the notion of timing into their communication. Spiking neurons accumulate information across time from connected (presynaptic) neurons (or via sensory input) in the form of a membrane potential. Once a neuron’s membrane potential surpasses a threshold value, it fires a binary “spike” to all of its outgoing (post-synaptic) connections. Spikes have been theoretically demonstrated to contain more information than rate-based representations of information (such as in ANNS), despite being both binary and sparse in time.41 In addition, modeling studies have shown the advantages of SNNs, such as better energy efficiency, the ability to process noisy and dynamic data, and the potential for more robust and fault-tolerant computing.42 These benefits are not solely attributed to their increased biological plausibility but also to the unique properties of spiking neural networks that distinguish them from conventional artificial neural networks. A simple working model of a leaky integrate-and-fire neuron is described as follows:
τmdVdt=ELV(t)+RmIinj(t),
where V(t) is the membrane potential at time t, τm is the membrane time constant, EL is the resting potential, Rm is the membrane resistance, Iinj(t) is the injected current, Vth is the threshold potential, and Vreset is the reset potential. When the membrane potential reaches the threshold potential, the neuron spikes, and the membrane potential is reset to the reset potential [if V(t) ≥ Vth, then V(t) ← Vreset].
FIG. 1.

Graphical depiction of long-term potentiation (LTP) and depression (LTD) at the synapse of biological neurons. (a) Synaptically connected pre- and post-synaptic neurons. (b) Synaptic terminal, the connection point between neurons. (c) Synaptic growth (LTP) and synaptic weakening (LTD). (d) (Top) Membrane potential dynamics in the axon hillock of the neuron. (Bottom) Pre- and post-synaptic spikes. (e) Spike-timing dependent plasticity curve depicting experimental recordings of LTP and LTD.

FIG. 1.

Graphical depiction of long-term potentiation (LTP) and depression (LTD) at the synapse of biological neurons. (a) Synaptically connected pre- and post-synaptic neurons. (b) Synaptic terminal, the connection point between neurons. (c) Synaptic growth (LTP) and synaptic weakening (LTD). (d) (Top) Membrane potential dynamics in the axon hillock of the neuron. (Bottom) Pre- and post-synaptic spikes. (e) Spike-timing dependent plasticity curve depicting experimental recordings of LTP and LTD.

Close modal

Homeostatic regulation is an additional process that maintains the stability of internal conditions against external changes. This regulation is achieved through feedback mechanisms that adjust physiological processes, ensuring optimal functioning and equilibrium within the organism. In SNNs, homeostatic regulation also involves adjusting the spike threshold of neurons to stabilize network activity. This adjustment can be mathematically described as Vth(t + 1) = Vth(t) + β · (AtargetAactual), where Vth represents the neuron’s threshold potential, β is a modulation parameter, and Atarget and Aactual denote the target and actual firing rates. Homeostatic regulation has been shown to be useful in learning applications of SNNs.43–45 

Despite these potential advantages, SNNs are still in the early stages of development, and there are several challenges that need to be addressed before they can be used more widely. One of the most pressing challenges is regarding how to optimize the synaptic weights of these models, as traditional backpropagation-based methods from ANNs fail due to the discrete and sparse nonlinearity. Irrespective of these challenges, there do exist some works that push the boundaries of what was thought possible with modern spiking networks, such as large spike-based transformer models.46 Spiking models are of great importance for this review since they form the basis of many brain-inspired learning algorithms.

Hebbian and spike-timing dependent plasticity (STDP) are two prominent models of synaptic plasticity that play important roles in shaping neural circuitry and behavior. The Hebbian learning rule, first proposed by Hebb in 1949,47 posits that synapses between neurons are strengthened when they are coactive, such that the activation of one neuron causally leads to the activation of another. STDP, on the other hand, is a more recently proposed model of synaptic plasticity that takes into account the precise timing of pre- and post-synaptic spikes48 to determine synaptic strengthening or weakening. It is widely believed that STDP plays a key role in the formation and refinement of neural circuits during development and in the ongoing adaptation of circuits in response to experience. In the following subsection, we will provide an overview of the basic principles of Hebbian learning and STDP.

Hebbian learning: Hebbian learning is based on the idea that the synaptic strength between two neurons should be increased if they are both active at the same time and decreased if they are not. Hebb suggested that this increase should occur when one cell “repeatedly or persistently takes part in firing” another cell (with causal implications). However, this principle is often expressed correlatively, as in the famous aphorism “cells that fire together, wire together” (variously attributed to Löwel49 or Shatz50).

Hebbian learning is often used as an unsupervised learning algorithm, where the goal is to identify patterns in the input data without explicit feedback.51 An example of this process is the Hopfield network, in which large binary patterns are easily stored in a fully connected recurrent network by applying a Hebbian rule to the (symmetric) weights.52 It can also be adapted for use in supervised learning algorithms, where the rule is modified to take into account the desired output of the network. In this case, the Hebbian learning rule is combined with a teaching signal that indicates the correct output for a given input.

A simple Hebbian learning rule can be described mathematically using the following equation:
Δwij=ηxixj,
where Δwij is the change in the weight between neuron i and neuron j, η is the learning rate, and xi is the “activity” in neurons i, often thought of as the neuron firing rate. This rule states that if the two neurons are activated at the same time, their connection should be strengthened.

One potential drawback of the basic Hebbian rule is its instability. For example, if xi and xj are initially weakly positively correlated, this rule will increase the weight between the two, which will in turn reinforce the correlation, leading to even larger weight increases, etc. Thus, some form of stabilization is needed. This can be done simply by bounding the weights or by more complex rules that take into account additional factors such as the history of the pre- and post-synaptic activity or the influence of other neurons in the network (see Ref. 53 for a practical review of many such rules).

Three-factor rules: Hebbian reinforcement learning: By incorporating information about rewards, Hebbian learning can also be used for reinforcement learning. An apparently plausible idea is simply to multiply the Hebbian update by the reward directly, as follows:
Δwij=ηxixjR,
with R being the reward (for this time step or the whole episode). Unfortunately, this idea does not produce reliable reinforcement learning. This can be perceived intuitively by noticing that, if wij is already at its optimal value, the rule above will still produce a net change and, thus, drive wij away from the optimum.

More formally, as pointed out by Frémaux et al.,54 to properly track the actual covariance between inputs, outputs, and rewards, at least one of the terms in the xixjR product must be centered, which is replaced by zero-mean fluctuations around its expected value. One possible solution is to center the rewards by subtracting a baseline from R, generally equal to the expected value of R for this trial. While helpful, in practice, this solution is generally insufficient.

A more effective solution is to remove the mean value from the outputs. This can be done easily by subjecting neural activations xj to occasional random perturbations Δxj, taken from a suitable zero-centered distribution—and then using the perturbation Δxj, rather than the raw post-synaptic activation xj, in the three-factor product,
Δwij=ηxiΔxjR.

This is the so-called “node perturbation” rule proposed by Fiete and Seung.55,56 Intuitively, notice that the effect of the xiΔxj increment is to push future xj responses (when encountering the same xi input) in the direction of the perturbation: larger if the perturbation was positive, smaller if the perturbation was negative. Multiplying this shift by R results in pushing future responses toward the perturbation if R is positive and away from it if R is negative. Even if R is not zero-mean, the net effect (in expectation) will still be to drive wij toward a higher R, although the variance will be higher.

This rule turns out to implement the REINFORCE algorithm (Williams’ original paper57 actually proposes an algorithm that is exactly node-perturbation for spiking stochastic neurons) and, thus, estimates the theoretical gradient of R over wij. It can also be implemented in a biologically plausible manner, allowing recurrent networks to learn non-trivial cognitive or motor tasks from sparse, delayed rewards.58 

Spike-timing dependent plasticity: Spike-timing dependent plasticity (STDP) is a theoretical model of synaptic plasticity that allows the strength of connections between neurons to be modified based on the relative timing of their spikes. Unlike the Hebbian learning rule, which relies on the simultaneous activation of pre- and post-synaptic neurons, STDP takes into account the precise timing of the pre- and post-synaptic spikes. In particular, STDP suggests that if a presynaptic neuron fires just before a post-synaptic neuron, the connection between them should be strengthened. Conversely, if the post-synaptic neuron fires just before the presynaptic neuron, the connection should be weakened.

STDP has been observed in a variety of biological systems, including the neocortex, hippocampus, and cerebellum. The rule has been shown to play a crucial role in the development and plasticity of neural circuits, including learning and memory processes. STDP has also been used as a basis for the development of artificial neural networks, which are designed to mimic the structure and function of the brain.

The mathematical equation for STDP is more complex than the Hebbian learning rule and can vary depending on the specific implementation. However, a common formulation is
Δwij=A+exp(Δt/τ+)if Δt>0,Aexp(Δt/τ)if Δt<0,
where Δwij is the change in the weight between neuron i and neuron j, Δt is the time difference between the pre- and post-synaptic spikes, A+ and A are the amplitudes of the potentiation and depression, respectively, and τ+ and τ are the time constants for the potentiation and depression, respectively. This rule states that the strength of the connection between the two neurons will be increased or decreased depending on the timing of their spikes relative to each other.

There are two primary approaches for weight optimization in artificial neural networks: error-driven global learning and brain-inspired local learning. In the first approach, the network weights are modified by driving a global error to its minimum value. This is achieved by delegating errors to each weight and synchronizing modifications between each weight. In contrast, brain-inspired local learning algorithms aim to learn in a more biologically plausible manner by modifying weights from dynamical equations using locally available information. Both optimization approaches have unique benefits and drawbacks. In the following sections, we will discuss the most commonly utilized form of error-driven global learning, backpropagation, followed by in-depth discussions of brain-inspired local algorithms. It is worth mentioning that these two approaches are not mutually exclusive and will often be integrated in order to complement their respective strengths.59–62 

Backpropagation is a powerful error-driven global learning method that changes the weight of connections between neurons in a neural network to produce a desired target behavior.63 This is accomplished through the use of a quantitative metric (an objective function) that describes the quality of behavior given sensory information (e.g., visual input, written text, and robotic joint positions). The backpropagation algorithm consists of two phases: the forward pass and the backward pass. In the forward pass, the input is propagated through the network, and the output is calculated. During the backward pass, the error between the predicted output and the “true” output is calculated, and the gradients of the loss function with respect to the weights of the network are calculated by propagating the error backward through the network. These gradients are then used to update the weights of the network using an optimization algorithm such as stochastic gradient descent. This process is repeated for many iterations until the weights converge to a set of values that minimize the loss function.

Here, we provide a brief mathematical explanation of backpropagation. First, we define a desired loss function, which is a function of the network’s outputs and the true values,
L(y,ŷ)=12i(yiŷi)2,
where y is the true output and ŷ is the network’s output. In this case, we are minimizing the squared error, but we could very well optimize for any smooth and differentiable loss function.
Next, we use the chain rule to calculate the gradient of the loss with respect to the weights of the network. Let wijl be the weight between neuron i in layer l and neuron j in layer l + 1, and let ail be the activation of neuron i in layer l. Then, the gradients of the loss with respect to the weights are given by
Lwijl=Lajl+1ajl+1zjl+1zjl+1wijl,
where zjl+1 is the weighted sum of the inputs to neuron j in layer l + 1. We can then use these gradients to update the weights of the network using gradient descent,
wijl=wijlαLwijl,
where α is the learning rate. By repeatedly calculating the gradients and updating the weights, the network gradually learns to minimize the loss function and make more accurate predictions. In practice, gradient descent methods are often combined with approaches to incorporate momentum in the gradient estimate, which has been shown to significantly improve generalization.64 

The impressive accomplishments of backpropagation have led neuroscientists to investigate whether it can provide a better understanding of learning in the brain. While it remains debated as to whether backpropagation variants could occur in the brain,65,66 it is clear that backpropagation in its current formulation is biologically implausible. Alternative theories suggest complex feedback circuits or the interaction of local activity and top-down signals (a “third-factor”) could support a similar form of backprop-like learning.65 

Despite its impressive performance, there are still fundamental algorithmic challenges that follow from repeatedly applying backpropagation to network weights. One such challenge is a phenomenon known as catastrophic forgetting, where a neural network forgets previously learned information when training on new data.13 This can occur when the network is fine-tuned on new data or when the network is trained on a sequence of tasks without retaining the knowledge learned from previous tasks. Catastrophic forgetting is a significant hurdle for developing neural networks that can continuously learn from diverse and changing environments. Another challenge is that backpropagation requires backpropagating information through all the layers of the network, which can be computationally expensive and time-consuming, especially for very deep networks. This can limit the scalability of deep learning algorithms and make it difficult to train large models on limited computing resources. Nonetheless, backpropagation has remained the most widely used and successful algorithm for applications involving artificial neural networks.

Another class of global learning algorithms that has gained significant attention in recent years is evolutionary and genetic algorithms. These algorithms are inspired by the process of natural selection and, in the context of ANNs, aim to optimize the weights of a neural network by mimicking the evolutionary process. In genetic algorithms,67 a population of neural networks is initialized with random weights, and each network is evaluated on a specific task or problem. The networks that perform better on the task are then selected for reproduction, whereby they produce offspring with slight variations in their weights. This process is repeated over several generations, with the best-performing networks being used for reproduction, making their behavior more likely across generations. Evolutionary algorithms operate similarly to genetic algorithms but use a different approach by approximating a stochastic gradient.68,69 This is accomplished by perturbing the weights and combining the network objective function performances to update the parameters. This results in a more global search of the weight space, which can be more efficient at finding optimal solutions compared to local search methods such as backpropagation.70 

One advantage of these algorithms is their ability to search a vast parameter space efficiently, making them suitable for problems with large numbers of parameters or complex search spaces. In addition, they do not require a differentiable objective function, which can be useful in scenarios where the objective function is difficult to define or calculate (e.g., spiking neural networks). However, these algorithms also have some drawbacks. One major limitation is the high computational cost required to evaluate and evolve a large population of networks. Another challenge is the potential for the algorithm to become stuck in local optima or to converge too quickly, resulting in suboptimal solutions. In addition, the use of random mutations can lead to instability and unpredictability in the learning process.

Regardless, evolutionary and genetic algorithms have shown promising results in various applications, particularly when optimizing non-differentiable and non-trivial parameter spaces. Ongoing research is focused on improving the efficiency and scalability of these algorithms, as well as discovering where and when it makes sense to use these approaches instead of gradient descent.

Unlike global learning algorithms such as backpropagation, which require information to be propagated through the entire network, local learning algorithms focus on updating synaptic weights based on local information from nearby or synaptically connected neurons (Fig. 2). These approaches are often strongly inspired by the plasticity of biological synapses. As will be seen, by leveraging local learning algorithms, ANNs can learn more efficiently and adapt to changing input distributions, making them better suited for real-world applications. In this section, we will review recent advances in brain-inspired local learning algorithms and their potential for improving the performance and robustness of ANNs (see Fig. 2).

FIG. 2.

Feedforward neural network computes an output given an input by propagating the input information downstream. The precise value of the output is determined by the weight of synaptic coefficients. To improve the output for a task given an input, the synaptic weights are modified. Synaptic plasticity algorithms represent computational models that emulate the brain’s ability to strengthen or weaken synapses—connections between neurons—based on their activity, thereby facilitating learning and memory formation. Three-factor plasticity refers to a model of synaptic plasticity in which changes to the strength of neural connections are determined by three factors: presynaptic activity, post-synaptic activity, and a modulatory signal, facilitating more nuanced and adaptive learning processes. The feedback alignment algorithm is a learning technique in which artificial neural networks are trained using random, fixed feedback connections rather than symmetric weight matrices, demonstrating that successful learning can occur without precise backpropagation. Backpropagation is a fundamental algorithm in machine learning and artificial intelligence used to train neural networks by calculating the gradient of the loss function with respect to the weights in the network.

FIG. 2.

Feedforward neural network computes an output given an input by propagating the input information downstream. The precise value of the output is determined by the weight of synaptic coefficients. To improve the output for a task given an input, the synaptic weights are modified. Synaptic plasticity algorithms represent computational models that emulate the brain’s ability to strengthen or weaken synapses—connections between neurons—based on their activity, thereby facilitating learning and memory formation. Three-factor plasticity refers to a model of synaptic plasticity in which changes to the strength of neural connections are determined by three factors: presynaptic activity, post-synaptic activity, and a modulatory signal, facilitating more nuanced and adaptive learning processes. The feedback alignment algorithm is a learning technique in which artificial neural networks are trained using random, fixed feedback connections rather than symmetric weight matrices, demonstrating that successful learning can occur without precise backpropagation. Backpropagation is a fundamental algorithm in machine learning and artificial intelligence used to train neural networks by calculating the gradient of the loss function with respect to the weights in the network.

Close modal

Backpropagation-derived local learning algorithms are a class of local learning algorithms that attempt to emulate the mathematical properties of backpropagation. Unlike the traditional backpropagation algorithm, which involves propagating error signals back through the entire network, backpropagation-derived local learning algorithms update synaptic weights based on local error gradients computed using backpropagation. This approach is computationally efficient and allows for online learning, making it suitable for applications where training data are continually arriving.

One prominent example of backpropagation-derived local learning algorithms is the Feedback Alignment (FA) algorithm,71,72 which replaces the weight transport matrix used in backpropagation with a fixed random matrix, allowing the error signal to propagate from direct connections, thus avoiding the need for backpropagating error signals. A brief mathematical description of feedback alignment is as follows: let wout be the weight matrix connecting the last layer of the network to the output, and win be the weight matrix connecting the input to the first layer. In feedback alignment, the error signal is propagated from the output to the input using the fixed random matrix B rather than the transpose of wout. The weight updates are then computed using the product of the input and the error signal, Δwin = −ηxz, where x is the input, η is the learning rate, and z is the error signal propagated backward through the network, similar to traditional backpropagation.

Direct Feedback Alignment72 (DFA) simplifies the weight transport chain compared with FA by directly connecting the output layer error to each hidden layer. The Sign-Symmetry (SS) algorithm is similar to FA except the feedback weights symmetrically share signs. Recent progress in feedback alignment explores incorporating backprojections with cortical hierarchies73 and is completely phase free (no forward or backward passes). While FA has exhibited impressive results on small datasets such as MNIST and CIFAR, their performance on larger datasets such as ImageNet is often suboptimal.74 On the other hand, recent studies have shown that the SS algorithm is capable of achieving comparable performance to backpropagation, even on large-scale datasets.75 

Eligibility propagation60,76 (e-prop) extends the idea of feedback alignment for spiking neural networks, combining the advantages of both traditional error backpropagation and biologically plausible learning rules, such as spike-timing-dependent plasticity (STDP). For each synapse, the e-prop algorithm computes and maintains an eligibility trace eji(t)=dzj(t)dWji, derived based on real-time recurrent learning.77,78 Eligibility traces measure the total contribution of this synapse to the neuron’s current output, taking into account all past inputs.3 This can be computed and updated in a purely forward manner, without backward passes. This eligibility trace is then multiplied by an estimate of the gradient of the error over the neuron’s output Lj(t)=dE(t)dzj(t) to obtain the actual weight gradient dE(t)dWji. Lj(t) itself is computed from the error at the output neurons, either by using symmetric feedback weights or by using fixed feedback weights, as in feedback alignment. A possible drawback of e-prop is that it requires a real-time error signal Lt at each point in time since it only takes into account past events and is blind to future errors. In particular, it cannot learn from delayed error signals that extend beyond the time scales of individual neurons (including short-term adaptation),60 in contrast with methods such as REINFORCE and node-perturbation. In addition, the weight update is an approximation of the true gradient, which can lead to difficulties in spatial scaling.

In the work of Refs. 79 and 80, a normative theory for synaptic learning based on recent genetic findings81 of neuronal signaling architectures is demonstrated. They propose that neurons communicate their contribution to the learning outcome to nearby neurons via cell-type-specific local neuromodulation and that neuron-type diversity and neuron-type-specific local neuromodulation may be critical pieces of the biological credit-assignment puzzle. In this work, the authors instantiate a simplified computational model based on eligibility propagation to explore this theory and show that their model, which includes both dopamine-like temporal difference and neuropeptide-like local modulatory signaling, leads to improvements over previous methods such as e-prop and feedback alignment.

Generalization properties: Techniques in deep learning have made tremendous strides toward understanding the generalization of their learning algorithms. A particularly useful discovery was that flat minima tend to lead to better generalization.82 What is meant by this is that, given a perturbation ɛ in the parameter space (synaptic weight values), more significant performance degradation is observed around narrower minima. Learning algorithms that find flatter minima in parameter space ultimately lead to better generalization.

Recent work has explored the generalization properties exhibited by (brain-inspired) backpropagation-derived local learning rules.83 Compared with backpropagation through time, backpropagation-derived local learning rules exhibit worse and more variable generalization, which does not improve by scaling the step size due to the gradient approximation being poorly aligned with the true gradient. While it is perhaps unsurprising that local approximations of an optimization process are going to have worse generalization properties than their complete counterpart, this work opens the door toward asking new questions about what the best approach toward designing brain-inspired learning algorithms is. It also opens the question as to whether backpropagation-derived local learning rules are even worth exploring given that they are fundamentally going to exhibit sub-par generalization.

In conclusion, while backpropagation-derived local learning rules present themselves as a promising approach to designing brain-inspired learning algorithms, they come with limitations that must be addressed. The poor generalization of these algorithms highlights the need for further research to improve their performance and to explore alternative brain-inspired learning rules. It also opens the question as to whether backpropagation-derived local learning rules are even worth exploring given that they are fundamentally going to exhibit sub-par generalization.

Meta-optimized plasticity rules offer an effective balance between error-driven global learning and brain-inspired local learning. Meta-learning can be defined as the automation of the search for learning algorithms themselves, where, instead of relying on human engineering to describe a learning algorithm, a search process to find that algorithm is employed.84 The idea of meta-learning naturally extends to brain-inspired learning algorithms, such that the brain-inspired mechanism of learning itself can be optimized, thereby allowing for the discovery of more efficient learning without manual tuning of the rule. In the following section, we discuss various aspects of this research, starting with differentiably optimized synaptic plasticity rules.

Differentiable plasticity: One instantiation of this principle in the literature is differentiable plasticity, which is a framework that focuses on optimizing synaptic plasticity rules in neural networks through gradient descent.85,86 In these rules, the plasticity rules are described in such a way that the parameters governing their dynamics are differentiable, allowing for backpropagation to be used for meta-optimization of the plasticity rule parameters (e.g., the η term in the simple Hebbian rule or the A+ term in the STDP rule). This allows the weight dynamics to precisely solve a task that requires the weights to be optimized during execution time, referred to as intra-lifetime learning.

Differentiable plasticity rules are also capable of the differentiable optimization of neuromodulatory dynamics.61,86 This framework includes two main variants of neuromodulation: global neuromodulation, where the direction and magnitude of weight changes are controlled by a network-output-dependent global parameter, and retroactive neuromodulation, where the effect of past activity is modulated by a dopamine-like signal within a short time window. This is enabled by the use of eligibility traces, which are used to keep track of which synapses contributed to recent activity, and the dopamine signal modulates the transformation of these traces into actual plastic changes.

Methods involving differentiable plasticity have seen improvements in a wide range of applications, from sequential associative tasks87 to familiarity detection88 to robotic noise adaptation.61 This method has also been used to optimize short-term plasticity rules,88,89 which exhibit improved performance in reinforcement and temporal supervised learning problems. While these methods show much promise, differentiable plasticity approaches take a tremendous amount of memory, as backpropagation is used to optimize multiple parameters for each synapse over time. Practical advancements with these methods will likely require parameter sharing90 or a more memory-efficient form of backpropagation.91 

Plasticity with spiking neurons: Recent advances in backpropagating through the non-differentiable part of spiking neurons with surrogate gradients have allowed for differentiable plasticity to be used to optimize plasticity rules in spiking neural networks.61 In Ref. 62, the capability of this optimization paradigm is demonstrated through the use of a differentiable spike-timing dependent plasticity rule to enable “learning to learn” on an online one-shot continual learning problem and an online one-shot image class recognition problem. A similar method was used to optimize the third-factor signal using the gradient approximation of e-prop as the plasticity rule, introducing a meta-optimization form of e-prop.92 Recurrent neural networks tuned by evolution can also be used for meta-optimized learning rules. Evolvable Neural Units93 (ENUs) introduce a gating structure that controls how the input is processed and stored, and dynamic parameters are updated. This work demonstrates the evolution of individual somatic and synaptic compartment models of neurons and shows that a network of ENUs can learn to solve a T-maze environment task, independently discovering spiking dynamics and reinforcement-type learning rules. Meta-learning has also been introduced to optimize the natural physical structure of spiking reservoir systems to determine the optimal initialization before a task is learned.94 

Plasticity in RNNs and Transformers: Independent of research aiming at learning plasticity using update rules, Transformers have recently been shown to be good intra-lifetime learners.5,95,96 The process of in-context learning works not through the update of synaptic weights but purely within the network activations. As in Transformers, this process can also happen in recurrent neural networks.97 While in-context learning initially appears to be a different mechanism from synaptic plasticity, these processes have been demonstrated to exhibit a strong relationship. One exciting connection discussed in the literature is the realization that parameter-sharing by the meta-learner often leads to the interpretation of activations as weights.98 This demonstrates that, while these models may have fixed weights, they exhibit some of the same learning capabilities as models with plastic weights. Another connection is that self-attention in the Transformer involves outer and inner products that can be cast as learned weight updates99 that can even implement gradient descent.100,101

Evolutionary and genetic meta-optimization: Much like differentiable plasticity, evolutionary and genetic algorithms have been used to optimize the parameters of plasticity rules in a variety of applications,102 including adaptation to limb damage on robotic systems.103,104 Recent work has also enabled the optimization of both plasticity coefficients and plasticity rule equations through the use of Cartesian genetic programming,105 presenting an automated approach for discovering biologically plausible plasticity rules based on the specific task being solved. In these methods, the genetic or evolutionary optimization process acts similarly to the differentiable process in that it optimizes the plasticity parameters in an outer-loop process, while the plasticity rule optimizes the reward in an inner-loop process. These methods are appealing since they have a much lower memory footprint compared to differentiable methods since they do not require backpropagating errors over time. However, while memory efficient, they often require a tremendous amount of data to get comparable performance to gradient-based methods.106 

Self-referential meta-learning: While synaptic plasticity has two-levels of learning, the meta-learner and the discovered learning rule, self-referential meta-learning107,108 extends this hierarchy. In plasticity approaches, only a subset of the network parameters are updated (e.g., the synaptic weights), whereas the meta-learned update rule remains fixed after meta-optimization. Self-referential architectures enable a neural network to modify all of its parameters in a recursive fashion. Thus, the learner can also modify the meta-learner. This, in principle, allows arbitrary levels of learning, meta-learning, meta–meta-learning, etc. Some approaches meta-learn the parameter initialization of such a system.107,109 Finding this initialization still requires a hardwired meta-learner. In other works, the network self-modifies in a way that eliminates even this meta-learner.108,110 Sometimes the learning rule to be discovered has structural search space restrictions that simplify self-improvement, where a gradient-based optimizer can discover itself111 or an evolutionary algorithm can optimize itself.112 Despite their differences, both synaptic plasticity and self-referential approaches aim to achieve self-improvement and adaptation in neural networks.

Generalization of meta-optimized learning rules: The extent to which discovered learning rules generalize to a wide range of tasks is a significant open question—in particular, when should they replace manually derived general-purpose learning rules such as backpropagation? A particular observation that poses a challenge to these methods is that when the search space is large and few restrictions are put on the learning mechanism,97,113,114 generalization is shown to become more difficult. However, toward amending this, in variable shared meta-learning,98 flexible learning rules were parameterized by parameter-shared recurrent neural networks that locally exchange information to implement learning algorithms that generalize across classification problems not seen during meta-optimization. Similar results have also been shown for the discovery of reinforcement learning algorithms.115 

Neuromorphic Computing: Neuromorphic computing represents a paradigm shift in the design of computing systems, with the goal of creating hardware that mimics the structure and functionality of the biological brain.42,116,117 This approach seeks to develop artificial neural networks that not only replicate the brain’s learning capabilities but also its energy efficiency and inherent parallelism. Neuromorphic computing systems often incorporate specialized hardware, such as neuromorphic chips or memristive devices, to enable the efficient execution of brain-inspired learning algorithms.117,118 These systems have the potential to drastically improve the performance of machine learning applications, particularly in edge computing and real-time processing scenarios.

A key aspect of neuromorphic computing lies in the development of specialized hardware architectures that facilitate the implementation of spiking neural networks, which more closely resemble the information processing mechanisms of biological neurons. Neuromorphic systems operate based on the principle of brain-inspired local learning, which allows them to achieve high energy efficiency, low-latency processing, and robustness against noise, which are critical for real-world applications.119 The integration of brain-inspired learning techniques with neuromorphic hardware is vital for the successful application of this technology.

In recent years, advances in neuromorphic computing have led to the development of various platforms, such as Intel’s Loihi,120 IBM’s TrueNorth,121 and SpiNNaker,122 which offer specialized hardware architectures for implementing SNNs and brain-inspired learning algorithms. These platforms provide a foundation for further exploration of neuromorphic computing systems, enabling researchers to design, simulate, and evaluate novel neural network architectures and learning rules. As neuromorphic computing continues to progress, it is expected to play a pivotal role in the future of artificial intelligence, driving innovation and enabling the development of more efficient, versatile, and biologically plausible learning systems.123,124

Robotic learning: Brain-inspired learning in neural networks has the potential to overcome many of the current challenges present in the field of robotics by enabling robots to learn and adapt to their environment in a more flexible way.125,126 Traditional robotics systems rely on pre-programmed behaviors, which are limited in their ability to adapt to changing conditions. In contrast, as we have shown in this review, neural networks can be trained to adapt to new situations by adjusting their internal parameters based on the data they receive.

Because of their natural relationship to robotics, brain-inspired learning algorithms have a long history in robotics.125 Toward this end, synaptic plasticity rules have been introduced for adapting robotic behavior to domain shifts such as motor gains and rough terrain,61,127–129 as well as for obstacle avoidance130–132 and articulated (arm) control.133,134 Brain-inspired learning rules have also been used to explore how learning occurs in the insect brain using robotic systems as an embodied medium.135–138 

Deep reinforcement learning (DRL) represents a significant success of brain-inspired learning algorithms, combining the strengths of neural networks with the theory of reinforcement learning in the brain to create autonomous agents capable of learning complex behaviors through interaction with their environment.139–141 By utilizing a reward-driven learning process emulating the activity of dopamine neurons142 as opposed to the minimization of, e.g., classification or regression error, DRL algorithms guide robots toward learning optimal strategies to achieve their goals, even in highly dynamic and uncertain environments.143,144 This powerful approach has been demonstrated in a variety of robotic applications, including dexterous manipulation, robotic locomotion,145 and multi-agent coordination.146 

Lifelong and online learning: Lifelong and online learning are essential applications of brain-inspired learning in artificial intelligence, as they enable systems to adapt to changing environments and continuously acquire new skills and knowledge.14 Traditional machine learning approaches, in contrast, are typically trained on a fixed dataset and lack the ability to adapt to new information or changing environments. The mature brain is an incredible medium for lifelong learning, as it is constantly learning while remaining relatively fixed in size across the span of a lifetime.147 As this review has demonstrated, neural networks endowed with brain-inspired learning mechanisms, similar to the brain, can be trained to learn and adapt continuously, improving their performance over time.

The development of brain-inspired learning algorithms that enable artificial systems to exhibit this capability has the potential to significantly enhance their performance and capabilities and has wide-ranging implications for a variety of applications. These applications are particularly useful in situations where data are scarce or expensive to collect, such as in robotics148 or autonomous systems,149 as they allow the system to learn and adapt in real-time rather than requiring large amounts of data to be collected and processed before learning can occur.

One of the primary objectives in the field of lifelong learning is to alleviate a major issue associated with the continuous application of backpropagation on ANNs, a phenomenon known as catastrophic forgetting.13 Catastrophic forgetting refers to the tendency of an ANN to abruptly forget previously learned information upon learning new data. This happens because the weights in the network that were initially optimized for earlier tasks are drastically altered to accommodate the new learning, thereby erasing or overwriting the previous information. This is because the backpropagation algorithm does not inherently factor in the need to preserve previously acquired information while facilitating new learning. Solving this problem has remained a significant hurdle in AI for decades. We posit that by employing brain-inspired learning algorithms that emulate the dynamic learning mechanisms of the brain, we may be able to capitalize on the proficient problem-solving strategies inherent to biological organisms.

Toward understanding the brain: The worlds of artificial intelligence and neuroscience have been greatly benefiting from each other. Deep neural networks, specially tailored for certain tasks, show striking similarities to the human brain in how they handle spatial150–152 and visual153–155 information. This overlap hints at the potential of artificial neural networks (ANNs) as useful models in our efforts to better understand the brain’s complex mechanics. A new movement referred to as the neuroconnectionist research program156 embodies this combined approach, using ANNs as a computational language to form and test ideas about how the brain computes. This perspective brings together different research efforts, offering a common computational framework and tools to test specific theories about the brain.

While this review highlights a range of algorithms that imitate the brain’s functions, we still have a substantial amount of work to do to fully grasp how learning actually happens in the brain. The use of backpropagation and backpropagation-like local learning rules to train large neural networks may provide a good starting point for modeling brain function. Much productive investigation has occurred to see what processes in the brain may operate similarly to backpropagation,65 leading to new perspectives and theories in neuroscience. Even though backpropagation in its current form might not occur in the brain, the idea that the brain might develop similar internal representations to ANNs despite such different mechanisms of learning is an exciting open question that may lead to a deeper understanding of the brain and AI.

Explorations are now extending beyond static network dynamics to the networks that unravel as a function of time, much like the brain. As we further develop algorithms for continual and lifelong learning, it may become clear that our models need to reflect the learning mechanisms observed in nature more closely. This shift in focus calls for the integration of local learning rules—those that mirror the brain’s own methods—into ANNs.

We are convinced that adopting more biologically authentic learning rules within ANNs will not only yield the aforementioned benefits, but it will also serve to point neuroscience researchers in the right direction. In other words, it is a strategy with a two-fold benefit: not only does it promise to invigorate innovation in engineering, but it also brings us closer to unraveling the intricate processes at play within the brain. With more realistic models, we can probe deeper into the complexities of brain computation from the novel perspective of artificial intelligence.

In this review, we investigate the integration of more biologically plausible learning mechanisms into ANNs. This further integration presents itself as an important step for both neuroscience and artificial intelligence. This is particularly relevant amid the tremendous progress that has been made in artificial intelligence with large language models and embedded systems, which are in critical need of more energy efficient approaches for learning and execution. In addition, while ANNs are making great strides in these applications, there are still major limitations in their ability to adapt such as biological brains, which we see as a primary application of brain-inspired learning mechanisms.

As we strategize for future collaboration between neuroscience and AI toward more detailed brain-inspired learning algorithms, it is important to acknowledge that the past influences of neuroscience on AI have seldom been about a straightforward application of ready-made solutions to machines.157 More often than not, neuroscience has stimulated AI researchers by posing intriguing algorithmic-level questions about aspects of animal learning and intelligence. It has provided preliminary guidance toward vital mechanisms that support learning. Our perspective is that by harnessing the insights drawn from neuroscience, we can significantly accelerate advancements in the learning mechanisms used in ANNs. Similarly, experiments using brain-like learning algorithms in AI can accelerate our understanding of neuroscience.

We thank the OpenBioML collaborative workspace, to which several of the authors of this work were connected. This material is based upon work supported by the National Science Foundation Graduate Research Fellowship for Comp/IS/Eng—Robotics under Grant Nos. DGE2139757 and 2235440.

The authors have no conflicts to disclose.

Samuel Schmidgall: Conceptualization (lead); Investigation (lead); Visualization (lead); Writing – original draft (lead); Writing – review & editing (lead). Rojin Ziaei: Conceptualization (equal); Writing – original draft (equal); Writing – review & editing (equal). Jascha Achterberg: Writing – review & editing (equal). Louis Kirsch: Writing – original draft (supporting); Writing – review & editing (supporting). S. Pardis Hajiseyedrazi: Visualization (equal); Writing – review & editing (equal). Jason Eshraghian: Supervision (equal); Writing – original draft (supporting); Writing – review & editing (equal).

Data sharing is not applicable to this article as no new data were created or analyzed in this study.

1.
K. M.
Newell
,
Y.-T.
Liu
, and
G.
Mayer-Kress
, “
Time scales in motor learning and development
,”
Psychol. Rev.
108
,
57
(
2001
).
2.
M. G.
Stokes
, “
‘Activity-silent’ working memory in prefrontal cortex: A dynamic coding framework
,”
Trends Cognit. Sci.
19
,
394
405
(
2015
).
3.
W.
Gerstner
,
M.
Lehmann
,
V.
Liakoni
,
D.
Corneil
, and
J.
Brea
, “
Eligibility traces and plasticity on behavioral time scales: Experimental support of NeoHebbian three-factor learning rules
,”
Front. Neural Circuits
12
,
53
(
2018
).
4.
I.
Beltagy
,
K.
Lo
, and
A.
Cohan
, “
SciBERT: A pretrained language model for scientific text
,” arXiv:1903.10676 (
2019
).
5.
T.
Brown
,
B.
Mann
,
N.
Ryder
,
M.
Subbiah
,
J. D.
Kaplan
,
P.
Dhariwal
,
A.
Neelakantan
,
P.
Shyam
,
G.
Sastry
,
A.
Askell
et al, “
Language models are few-shot learners
,”
Adv. Neural Inf. Process. Syst.
33
,
1877
1901
(
2020
).
6.
A.
Ramesh
,
P.
Dhariwal
,
A.
Nichol
,
C.
Chu
, and
M.
Chen
, “
Hierarchical text-conditional image generation with clip latents
,” arXiv:2204.06125 (
2022
).
7.
C.
Saharia
,
W.
Chan
,
S.
Saxena
,
L.
Li
,
J.
Whang
,
E.
Denton
,
S. K. S.
Ghasemipour
,
B. K.
Ayan
,
S. S.
Mahdavi
,
R. G.
Lopes
et al, “
Photorealistic text-to-image diffusion models with deep language understanding
,” arXiv:2205.11487 (
2022
).
8.
A.
Kumar
,
Z.
Fu
,
D.
Pathak
, and
J.
Malik
, “
RMA: Rapid motor adaptation for legged robots
,” arXiv:2107.04034 (
2021
).
9.
T.
Miki
,
J.
Lee
,
J.
Hwangbo
,
L.
Wellhausen
,
V.
Koltun
, and
M.
Hutter
, “
Learning robust perceptive locomotion for quadrupedal robots in the wild
,”
Sci. Robot.
7
,
eabk2822
(
2022
).
10.
Z.
Fu
,
X.
Cheng
, and
D.
Pathak
, “
Deep whole-body control: Learning a unified policy for manipulation and locomotion
,” arXiv:2210.10044 (
2022
).
11.
D.
Silver
,
T.
Hubert
,
J.
Schrittwieser
,
I.
Antonoglou
,
M.
Lai
,
A.
Guez
,
M.
Lanctot
,
L.
Sifre
,
D.
Kumaran
,
T.
Graepel
et al, “
A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play
,”
Science
362
,
1140
1144
(
2018
).
12.
D.
Driess
,
F.
Xia
,
M. S.
Sajjadi
,
C.
Lynch
,
A.
Chowdhery
,
B.
Ichter
,
A.
Wahid
,
J.
Tompson
,
Q.
Vuong
,
T.
Yu
et al, “
PaLM-E: An embodied multimodal language model
,” arXiv:2303.03378 (
2023
).
13.
J.
Kirkpatrick
,
R.
Pascanu
,
N.
Rabinowitz
,
J.
Veness
,
G.
Desjardins
,
A. A.
Rusu
,
K.
Milan
,
J.
Quan
,
T.
Ramalho
,
A.
Grabska-Barwinska
et al, “
Overcoming catastrophic forgetting in neural networks
,”
Proc. Natl. Acad. Sci. U. S. A.
114
,
3521
3526
(
2017
).
14.
G. I.
Parisi
,
R.
Kemker
,
J. L.
Part
,
C.
Kanan
, and
S.
Wermter
, “
Continual lifelong learning with neural networks: A review
,”
Neural networks
113
,
54
71
(
2019
).
15.
D.
Kudithipudi
,
M.
Aguilar-Simon
,
J.
Babb
,
M.
Bazhenov
,
D.
Blackiston
,
J.
Bongard
,
A. P.
Brna
,
S.
Chakravarthi Raja
,
N.
Cheney
,
J.
Clune
et al, “
Biological underpinnings for lifelong learning machines
,”
Nat. Mach. Intell.
4
,
196
210
(
2022
).
16.
V. M.
Ho
,
J.-A.
Lee
, and
K. C.
Martin
, “
The cell biology of synaptic plasticity
,”
Science
334
,
623
628
(
2011
).
17.
A.
Citri
and
R. C.
Malenka
, “
Synaptic plasticity: Multiple forms, functions, and mechanisms
,”
Neuropsychopharmacology
33
,
18
41
(
2008
).
18.
W. C.
Abraham
,
O. D.
Jones
, and
D. L.
Glanzman
, “
Is plasticity of synapses the mechanism of long-term memory storage?
,”
NPJ Sci. Learn.
4
,
9
10
(
2019
).
19.
R. S.
Zucker
and
W. G.
Regehr
, “
Short-term synaptic plasticity
,”
Annu. Rev. Physiol.
64
,
355
405
(
2002
).
20.
R.
Yuste
and
T.
Bonhoeffer
, “
Morphological changes in dendritic spines associated with long-term synaptic plasticity
,”
Annu. Rev. Neurosci.
24
,
1071
1089
(
2001
).
21.
N.
Frémaux
and
W.
Gerstner
, “
Neuromodulated spike-timing-dependent plasticity, and theory of three-factor learning rules
,”
Front. Neural Circuits
9
,
85
(
2016
).
22.
Z.
Brzosko
,
S. B.
Mierau
, and
O.
Paulsen
, “
Neuromodulation of spike-timing-dependent plasticity: Past, present, and future
,”
Neuron
103
,
563
581
(
2019
).
23.
D. A.
McCormick
,
D. B.
Nestvogel
, and
B. J.
He
, “
Neuromodulation of brain state and behavior
,”
Annu. Rev. Neurosci.
43
,
391
415
(
2020
).
24.
W. C.
Abraham
and
M. F.
Bear
, “
Metaplasticity: The plasticity of synaptic plasticity
,”
Trends Neurosci.
19
,
126
130
(
1996
).
25.
W. C.
Abraham
, “
Metaplasticity: Tuning synapses and networks for plasticity
,”
Nat. Rev. Neurosci.
9
,
387
(
2008
).
26.
P.
Yger
and
M.
Gilson
, “
Models of metaplasticity: A review of concepts
,”
Front. Comput. Neurosci.
9
,
138
(
2015
).
27.
D. A.
Lim
and
A.
Alvarez-Buylla
, “
The adult ventricular–subventricular zone (V-SVZ) and olfactory bulb (OB) neurogenesis
,”
Cold Spring Harbor Perspect. Biol.
8
,
a018820
(
2016
).
28.
S. S.
Roeder
,
P.
Burkardt
,
F.
Rost
,
J.
Rode
,
L.
Brusch
,
R.
Coras
,
E.
Englund
,
K.
Håkansson
,
G.
Possnert
,
M.
Salehpour
et al, “
Evidence for postnatal neurogenesis in the human amygdala
,”
Commun. Biol.
5
,
366
(
2022
).
29.
H. G.
Kuhn
,
H.
Dickinson-Anson
, and
F. H.
Gage
, “
Neurogenesis in the dentate gyrus of the adult rat: Age-related decrease of neuronal progenitor proliferation
,”
J. Neurosci.
16
,
2027
2033
(
1996
).
30.
G.
Kempermann
,
H. G.
Kuhn
, and
F. H.
Gage
, “
Experience-induced neurogenesis in the senescent dentate gyrus
,”
J. Neurosci.
18
,
3206
3212
(
1998
).
31.
H.
Van Praag
,
T.
Shubert
,
C.
Zhao
, and
F. H.
Gage
, “
Exercise enhances learning and hippocampal neurogenesis in aged mice
,”
J. Neurosci.
25
,
8680
8685
(
2005
).
32.
M. S.
Nokia
,
S.
Lensu
,
J. P.
Ahtiainen
,
P. P.
Johansson
,
L. G.
Koch
,
S. L.
Britton
, and
H.
Kainulainen
, “
Physical exercise increases adult hippocampal neurogenesis in male rats provided it is aerobic and sustained
,”
J. Physiol.
594
,
1855
1873
(
2016
).
33.
E. D.
Kirby
,
S. E.
Muroy
,
W. G.
Sun
,
D.
Covarrubias
,
M. J.
Leong
,
L. A.
Barchas
, and
D.
Kaufer
, “
Acute stress enhances adult rat hippocampal neurogenesis and activation of newborn neurons via secreted astrocytic FGF2
,”
eLife
2
,
e00362
(
2013
).
34.
S.-H.
Baik
,
V.
Rajeev
,
D. Y.-W.
Fann
,
D.-G.
Jo
, and
T. V.
Arumugam
, “
Intermittent fasting increases adult hippocampal neurogenesis
,”
Brain Behav.
10
,
e01444
(
2020
).
35.
K. J.
Todd
,
A.
Serrano
,
J.-C.
Lacaille
, and
R.
Robitaille
, “
Glial cells in synaptic plasticity
,”
J. Physiol.
99
,
75
83
(
2006
).
36.
W.-S.
Chung
,
N. J.
Allen
, and
C.
Eroglu
, “
Astrocytes control synapse formation, function, and elimination
,”
Cold Spring Harbor Perspect. Biol.
7
,
a020370
(
2015
).
37.
M.
Zhou
,
J.
Cornell
,
S.
Salinas
, and
H. Y.
Huang
, “
Microglia regulation of synaptic plasticity and learning and memory
,”
Neural Regener. Res.
17
,
705
(
2022
).
38.
R.
Desislavov
,
F.
Martínez-Plumed
, and
J.
Hernández-Orallo
, “
Compute and energy consumption trends in deep learning inference
,” arXiv:2109.05472 (
2021
).
39.
F.
Daghero
,
D. J.
Pagliari
, and
M.
Poncino
, “
Energy-efficient deep learning inference on edge devices
,” in
Advances in Computers
(
Elsevier
,
2021
), Vol.
122
, pp.
247
301
.
40.
M.
Pfeiffer
and
T.
Pfeil
, “
Deep learning with spiking neurons: Opportunities and challenges
,”
Front. Neurosci.
12
,
774
(
2018
).
41.
W.
Maass
, “
Networks of spiking neurons: The third generation of neural network models
,”
Neural Networks
10
,
1659
1671
(
1997
).
42.
C. D.
Schuman
,
S. R.
Kulkarni
,
M.
Parsa
,
J. P.
Mitchell
,
P.
Date
, and
B.
Kay
, “
Opportunities for neuromorphic computing algorithms and applications
,”
Nat. Comput. Sci.
2
,
10
19
(
2022
).
43.
A.
Gilra
and
W.
Gerstner
, “
Predicting non-linear dynamics by stable local learning in a recurrent spiking neural network
,”
eLife
6
,
e28295
(
2017
).
44.
K. D.
Carlson
,
M.
Richert
,
N.
Dutt
, and
J. L.
Krichmar
, “
Biologically plausible models of homeostasis and STDP: Stability and learning in spiking neural networks
,” in
2013 International Joint Conference on Neural Networks (IJCNN)
(
IEEE
,
2013
), pp.
1
8
.
45.
B.
Walters
,
C.
Lammie
,
S.
Yang
,
M. V.
Jacob
, and
M.
Rahimi Azghadi
, “
Unsupervised character recognition with graphene memristive synapses
,”
Neural Comput. Appl.
36
,
1569
1584
(
2023
).
46.
R.-J.
Zhu
,
Q.
Zhao
, and
J. K.
Eshraghian
, “
SpikeGPT: Generative pre-trained language model with spiking neural networks
,” arXiv:2302.13939 (
2023
).
47.
D. O.
Hebb
,
The Organization of Behavior: A Neuropsychological Theory
(
Psychology Press
,
2005
).
48.
H.
Markram
,
W.
Gerstner
, and
P. J.
Sjöström
, “
A history of spike-timing-dependent plasticity
,”
Front. Synaptic Neurosci.
3
,
4
(
2011
).
49.
S.
Löwel
and
W.
Singer
, “
Selection of intrinsic horizontal connections in the visual cortex by correlated neuronal activity
,”
Science
255
,
209
212
(
1992
).
50.
C. J.
Shatz
, “
The developing brain
,”
Sci. Am.
267
,
60
67
(
1992
).
51.
W.
Gerstner
,
W. M.
Kistler
,
R.
Naud
, and
L.
Paninski
,
Neuronal Dynamics: From Single Neurons to Networks and Models of Cognition
(
Cambridge University Press
,
2014
), Chap. 10.
52.
J. J.
Hopfield
, “
Neural networks and physical systems with emergent collective computational abilities
,”
Proc. Natl. Acad. Sci. U. S. A.
79
,
2554
2558
(
1982
).
53.
Z.
Vasilkoski
,
H.
Ames
,
B.
Chandler
,
A.
Gorchetchnikov
,
J.
Léveillé
,
G.
Livitz
,
E.
Mingolla
, and
M.
Versace
, “
Review of stability properties of neural plasticity rules for implementation on memristive neuromorphic hardware
,” in
2011 International Joint Conference on Neural Networks
(
IEEE
,
2011
), pp.
2563
2569
.
54.
N.
Frémaux
,
H.
Sprekeler
, and
W.
Gerstner
, “
Functional requirements for reward-modulated spike-timing-dependent plasticity
,”
J. Neurosci.
30
,
13326
13337
(
2010
).
55.
I. R.
Fiete
and
H. S.
Seung
, “
Gradient learning in spiking neural networks by dynamic perturbation of conductances
,”
Phys. Rev. Lett.
97
,
048104
(
2006
).
56.
I. R.
Fiete
,
M. S.
Fee
, and
H. S.
Seung
, “
Model of birdsong learning based on gradient estimation by dynamic perturbation of neural conductances
,”
J. Neurophysiol.
98
,
2038
2057
(
2007
).
57.
R. J.
Williams
, “
Simple statistical gradient-following algorithms for connectionist reinforcement learning
,”
Reinf. Learn.
173
,
5
32
(
1992
).
58.
T.
Miconi
, “
Biologically plausible learning in recurrent neural networks reproduces neural dynamics observed during cognitive tasks
,”
eLife
6
,
e20899
(
2017
).
59.
J. C.
Whittington
,
T. H.
Muller
,
S.
Mark
,
G.
Chen
,
C.
Barry
,
N.
Burgess
, and
T. E.
Behrens
, “
The Tolman-Eichenbaum machine: Unifying space and relational memory through generalization in the hippocampal formation
,”
Cell
183
,
1249
1263.e23
(
2020
).
60.
G.
Bellec
,
F.
Scherr
,
A.
Subramoney
,
E.
Hajek
,
D.
Salaj
,
R.
Legenstein
, and
W.
Maass
, “
A solution to the learning dilemma for recurrent networks of spiking neurons
,”
Nat. Commun.
11
,
3625
(
2020
).
61.
S.
Schmidgall
,
J.
Ashkanazy
,
W.
Lawson
, and
J.
Hays
, “
SpikePropamine: Differentiable plasticity in spiking neural networks
,”
Front. Neurorobotics
15
,
629210
(
2021
).
62.
S.
Schmidgall
and
J.
Hays
, “
Meta-spikePropamine: Learning to learn with synaptic plasticity in spiking neural networks
,”
Front. Neurosci.
17
,
671
(
2023
).
63.
D. E.
Rumelhart
,
G. E.
Hinton
, and
R. J.
Williams
, “
Learning representations by back-propagating errors
,”
Nature
323
,
533
536
(
1986
).
64.
S.
Ruder
, “
An overview of gradient descent optimization algorithms
,” arXiv:1609.04747 (
2016
).
65.
T. P.
Lillicrap
,
A.
Santoro
,
L.
Marris
,
C. J.
Akerman
, and
G.
Hinton
, “
Backpropagation and the brain
,”
Nat. Rev. Neurosci.
21
,
335
346
(
2020
).
66.
J. C.
Whittington
and
R.
Bogacz
, “
Theories of error back-propagation in the brain
,”
Trends Cognit. Sci.
23
,
235
250
(
2019
).
67.
J. H.
Holland
, “
Genetic algorithms
,”
Sci. Am.
267
,
66
72
(
1992
).
68.
K.
De Jong
, “
Evolutionary computation: A unified approach
,” in
Proceedings of the 2016 on Genetic and Evolutionary Computation Conference Companion
(Association for Computing Machinery,
2017
), pp.
185
199
.
69.
T.
Salimans
,
J.
Ho
,
X.
Chen
,
S.
Sidor
, and
I.
Sutskever
, “
Evolution strategies as a scalable alternative to reinforcement learning
,” arXiv:1703.03864 (
2017
).
70.
X.
Zhang
,
J.
Clune
, and
K. O.
Stanley
, “
On the relationship between the OpenAI evolution strategy and stochastic gradient descent
,” arXiv:1712.06564 (
2017
).
71.
T. P.
Lillicrap
,
D.
Cownden
,
D. B.
Tweed
, and
C. J.
Akerman
, “
Random feedback weights support learning in deep neural networks
,” arXiv:1411.0247 (
2014
).
72.
A.
Nøkland
, “
Direct feedback alignment provides learning in deep neural networks
,” in Advances in Neural Information Processing Systems 29 (Curran Associates,
2016
).
73.
K.
Max
,
L.
Kriener
,
G.
Pineda García
,
T.
Nowotny
,
W.
Senn
, and
M. A.
Petrovici
, “
Learning efficient backprojections across cortical hierarchies in real time
,” in
International Conference on Artificial Neural Networks
(
Springer
,
2023
), pp.
556
559
.
74.
S.
Bartunov
,
A.
Santoro
,
B.
Richards
,
L.
Marris
,
G. E.
Hinton
, and
T.
Lillicrap
, “
Assessing the scalability of biologically-motivated deep learning algorithms and architectures
,” in Advances in Neural Information Processing Systems 31 (Curran Associates,
2018
).
75.
W.
Xiao
,
H.
Chen
,
Q.
Liao
, and
T.
Poggio
, “
Biologically-plausible learning algorithms can scale to large datasets
,” arXiv:1811.03567 (
2018
).
76.
G.
Bellec
,
F.
Scherr
,
E.
Hajek
,
D.
Salaj
,
A.
Subramoney
,
R.
Legenstein
, and
W.
Maass
, “
Eligibility traces provide a data-inspired alternative to backpropagation through time
,” in
Real Neurons Hidden Units: Future Directions at the Intersection of Neuroscience and Artificial Intelligence@ NeurIPS 2019
,
2019
.
77.
R. J.
Williams
and
D.
Zipser
, “
A learning algorithm for continually running fully recurrent neural networks
,”
Neural Comput.
1
,
270
280
(
1989
).
78.
J. K.
Eshraghian
,
M.
Ward
,
E. O.
Neftci
,
X.
Wang
,
G.
Lenz
,
G.
Dwivedi
,
M.
Bennamoun
,
D. S.
Jeong
, and
W. D.
Lu
, “
Training spiking neural networks using lessons from deep learning
,”
Proc. IEEE
111
,
1016
(
2023
).
79.
Y. H.
Liu
,
S.
Smith
,
S.
Mihalas
,
E.
Shea-Brown
, and
U.
Sümbül
, “
Cell-type–specific neuromodulation guides synaptic credit assignment in a spiking neural network
,”
Proc. Natl. Acad. Sci. U. S. A.
118
,
e2111821118
(
2021
).
80.
Y. H.
Liu
,
S.
Smith
,
S.
Mihalas
,
E.
Shea-Brown
, and
U.
Sümbül
, “
Biologically-plausible backpropagation through arbitrary timespans via local neuromodulators
,” arXiv:2206.01338 (
2022
).
81.
S. J.
Smith
,
U.
Sümbül
,
L. T.
Graybuck
,
F.
Collman
,
S.
Seshamani
,
R.
Gala
,
O.
Gliko
,
L.
Elabbady
,
J. A.
Miller
,
T. E.
Bakken
et al, “
Single-cell transcriptomic evidence for dense intracortical neuropeptide networks
,”
eLife
8
,
e47889
(
2019
).
82.
S.
Hochreiter
and
J.
Schmidhuber
, “
Flat minima
,”
Neural Comput.
9
,
1
42
(
1997
).
83.
Y. H.
Liu
,
A.
Ghosh
,
B. A.
Richards
,
E.
Shea-Brown
, and
G.
Lajoie
, “
Beyond accuracy: Generalization properties of bio-plausible temporal credit assignment rules
,” arXiv:2206.00823 (
2022
).
84.
J.
Schmidhuber
, “
Evolutionary principles in self-referential learning. On learning now to learn: The meta-meta...-hook
,” Ph.D. thesis,
Technische Universität München
,
1987
.
85.
T.
Miconi
,
K.
Stanley
, and
J.
Clune
, “
Differentiable plasticity: Training plastic neural networks with backpropagation
,” in
International Conference on Machine Learning
(
PMLR
,
2018
), pp.
3559
3568
.
86.
T.
Miconi
,
A.
Rawal
,
J.
Clune
, and
K. O.
Stanley
, “
Backpropamine: Training self-modifying neural networks with differentiable neuromodulated plasticity
,” arXiv:2002.10585 (
2020
).
87.
Y.
Duan
,
Z.
Jia
,
Q.
Li
,
Y.
Zhong
, and
K.
Ma
, “
Hebbian and gradient-based plasticity enables robust memory and rapid learning in RNNs
,” arXiv:2302.03235 (
2023
).
88.
D.
Tyulmankov
,
G. R.
Yang
, and
L.
Abbott
, “
Meta-learning synaptic plasticity and memory addressing for continual familiarity detection
,”
Neuron
110
,
544
557
(
2022
).
89.
H. G.
Rodriguez
,
Q.
Guo
, and
T.
Moraitis
, “
Short-term plasticity neurons learning to learn and forget
,” in
International Conference on Machine Learning
(
PMLR
,
2022
), pp.
18704
18722
.
90.
R. B.
Palm
,
E.
Najarro
, and
S.
Risi
, “
Testing the genomic bottleneck hypothesis in hebbian meta-learning
,” in
NeurIPS 2020 Workshop on Pre-Registration in Machine Learning
(
PMLR
,
2021
), pp.
100
110
.
91.
A.
Gruslys
,
R.
Munos
,
I.
Danihelka
,
M.
Lanctot
, and
A.
Graves
, “
Memory-efficient backpropagation through time
,” in Advances in Neural Information Processing Systems 29 (Curran Associates,
2016
).
92.
F.
Scherr
,
C.
Stöckl
, and
W.
Maass
, “
One-shot learning with spiking neural networks
,” bioRxiv:156513v1 (
2020
).
93.
P.
Bertens
and
S.-W.
Lee
, “
Network of evolvable neural units can learn synaptic learning rules and spiking dynamics
,”
Nat. Mach. Intell.
2
,
791
799
(
2020
).
94.
R.
Zhu
,
J.
Eshraghian
, and
Z.
Kuncic
, “
Memristive reservoirs learn to learn
,” in
Proceedings of the 2023 International Conference on Neuromorphic Systems
(Association for Computing Machinery,
2023
), pp.
1
7
.
95.
S.
Garg
,
D.
Tsipras
,
P. S.
Liang
, and
G.
Valiant
, “
What can transformers learn in-context? A case study of simple function classes
,”
Adv. Neural Inf. Process. Syst.
35
,
30583
30598
(
2022
).
96.
L.
Kirsch
,
J.
Harrison
,
J.
Sohl-Dickstein
, and
L.
Metz
, “
General-purpose in-context learning by meta-learning transformers
,” arXiv:2212.04458 (
2022
).
97.
S.
Hochreiter
,
A. S.
Younger
, and
P. R.
Conwell
, “
Learning to learn using gradient descent
,” in
Artificial Neural Networks—ICANN 2001: International Conference Vienna, Austria, August 21–25, 2001 Proceedings 11
(
Springer
,
2001
), pp.
87
94
.
98.
L.
Kirsch
and
J.
Schmidhuber
, “
Meta learning backpropagation and improving it
,”
Adv. Neural Inf. Process. Syst.
34
,
14122
14134
(
2021
).
99.
I.
Schlag
,
K.
Irie
, and
J.
Schmidhuber
, “
Linear transformers are secretly fast weight programmers
,” in
International Conference on Machine Learning
(
PMLR
,
2021
), pp.
9355
9366
.
100.
E.
Akyürek
,
D.
Schuurmans
,
J.
Andreas
,
T.
Ma
, and
D.
Zhou
, “
What learning algorithm is in-context learning? Investigations with linear models
,” arXiv:2211.15661 (
2022
).
101.
J.
von Oswald
,
E.
Niklasson
,
E.
Randazzo
,
J.
Sacramento
,
A.
Mordvintsev
,
A.
Zhmoginov
, and
M.
Vladymyrov
, “
Transformers learn in-context by gradient descent
,” arXiv:2212.07677 (
2022
).
102.
A.
Soltoggio
,
K. O.
Stanley
, and
S.
Risi
, “
Born to learn: The inspiration, progress, and future of evolved plastic artificial neural networks
,”
Neural Networks
108
,
48
67
(
2018
).
103.
S.
Schmidgall
, “
Adaptive reinforcement learning through evolving self-modifying neural networks
,” in
Proceedings of the 2020 Genetic and Evolutionary Computation Conference Companion
(Association for Computing Machinery,
2020
), pp.
89
90
.
104.
E.
Najarro
and
S.
Risi
, “
Meta-learning through Hebbian plasticity in random networks
,”
Adv. Neural Inf. Process. Syst.
33
,
20719
20731
(
2020
).
105.
J.
Jordan
,
M.
Schmidt
,
W.
Senn
, and
M. A.
Petrovici
, “
Evolving interpretable plasticity for spiking networks
,”
eLife
10
,
e66273
(
2021
).
106.
P.
Pagliuca
,
N.
Milano
, and
S.
Nolfi
, “
Efficacy of modern neuro-evolutionary strategies for continuous control optimization
,”
Front. Robot. AI
7
,
98
(
2020
).
107.
J.
Schmidhuber
, “
A ‘self-referential’weight matrix
,” in
ICANN’93: Proceedings of the International Conference on Artificial Neural Networks, Amsterdam, The Netherlands, 13–16 September 1993
(
Springer
,
1993
), pp.
446
450
.
108.
L.
Kirsch
and
J.
Schmidhuber
, “
Eliminating meta optimization through self-referential meta learning
,” arXiv:2212.14392 (
2022
).
109.
K.
Irie
,
I.
Schlag
,
R.
Csordás
, and
J.
Schmidhuber
, “
A modern self-referential weight matrix that learns to modify itself
,” in
International Conference on Machine Learning
(
PMLR
,
2022
), pp.
9660
9677
.
110.
L.
Kirsch
and
J.
Schmidhuber
, “
Self-referential meta learning
,” in
First Conference on Automated Machine Learning (Late-Breaking Workshop)
,
2022
.
111.
L.
Metz
,
C. D.
Freeman
,
N.
Maheswaranathan
, and
J.
Sohl-Dickstein
, “
Training learned optimizers with randomly initialized learned optimizers
,” arXiv:2101.07367 (
2021
).
112.
R. T.
Lange
,
T.
Schaul
,
Y.
Chen
,
T.
Zahavy
,
V.
Dallibard
,
C.
Lu
,
S.
Singh
, and
S.
Flennerhag
, “
Discovering evolution strategies via meta-black-box optimization
,” arXiv:2211.11260 (
2022
).
113.
J. X.
Wang
,
Z.
Kurth-Nelson
,
D.
Tirumala
,
H.
Soyer
,
J. Z.
Leibo
,
R.
Munos
,
C.
Blundell
,
D.
Kumaran
, and
M.
Botvinick
, “
Learning to reinforcement learn
,” arXiv:1611.05763 (
2016
).
114.
Y.
Duan
,
J.
Schulman
,
X.
Chen
,
P. L.
Bartlett
,
I.
Sutskever
, and
P.
Abbeel
, “
Rl2: Fast reinforcement learning via slow reinforcement learning
,” arXiv:1611.02779 (
2016
).
115.
L.
Kirsch
,
S.
Flennerhag
,
H.
van Hasselt
,
A.
Friesen
,
J.
Oh
, and
Y.
Chen
, “
Introducing symmetries to black box meta reinforcement learning
,”
Proc. AAAI Conf. Artif. Intell.
36
,
7202
7210
(
2022
).
116.
C. D.
Schuman
,
T. E.
Potok
,
R. M.
Patton
,
J. D.
Birdwell
,
M. E.
Dean
,
G. S.
Rose
, and
J. S.
Plank
, “
A survey of neuromorphic computing and neural networks in hardware
,” arXiv:1705.06963 (
2017
).
117.
J.-Q.
Yang
,
R.
Wang
,
Y.
Ren
,
J.-Y.
Mao
,
Z.-P.
Wang
,
Y.
Zhou
, and
S.-T.
Han
, “
Neuromorphic engineering: From biological to spike-based hardware nervous systems
,”
Adv. Mater.
32
,
2003610
(
2020
).
118.
M. R.
Azghadi
,
C.
Lammie
,
J. K.
Eshraghian
,
M.
Payvand
,
E.
Donati
,
B.
Linares-Barranco
, and
G.
Indiveri
, “
Hardware implementation of deep network accelerators towards healthcare and biomedical applications
,”
IEEE Trans. Biomed. Circuits Syst.
14
,
1138
1159
(
2020
).
119.
L.
Khacef
,
P.
Klein
,
M.
Cartiglia
,
A.
Rubino
,
G.
Indiveri
, and
E.
Chicca
, “
Spike-based local synaptic plasticity: A survey of computational models and neuromorphic circuits
,” arXiv:2209.15536 (
2022
).
120.
M.
Davies
,
N.
Srinivasa
,
T.-H.
Lin
,
G.
Chinya
,
Y.
Cao
,
S. H.
Choday
,
G.
Dimou
,
P.
Joshi
,
N.
Imam
,
S.
Jain
et al, “
Loihi: A neuromorphic manycore processor with on-chip learning
,”
IEEE Micro
38
,
82
99
(
2018
).
121.
F.
Akopyan
,
J.
Sawada
,
A.
Cassidy
,
R.
Alvarez-Icaza
,
J.
Arthur
,
P.
Merolla
,
N.
Imam
,
Y.
Nakamura
,
P.
Datta
,
G.-J.
Nam
et al, “
TrueNorth: Design and tool flow of a 65 mW 1 million neuron programmable neurosynaptic chip
,”
IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst.
34
,
1537
1557
(
2015
).
122.
E.
Painkras
,
L. A.
Plana
,
J.
Garside
,
S.
Temple
,
F.
Galluppi
,
C.
Patterson
,
D. R.
Lester
,
A. D.
Brown
, and
S. B.
Furber
, “
SpiNNaker: A 1-W 18-core system-on-chip for massively-parallel neural network simulation
,”
IEEE J. Solid-State Circuits
48
,
1943
1953
(
2013
).
123.
F.
Modaresi
,
M.
Guthaus
, and
J. K.
Eshraghian
, “
OpenSpike: An OpenRAM SNN accelerator
,” arXiv:2302.01015 (
2023
).
124.
A.
Mehonic
and
J.
Eshraghian
, “
Brains and bytes: Trends in neuromorphic technology
,”
APL Mach. Learn.
1
,
020401
(
2023
).
125.
D.
Floreano
,
A. J.
Ijspeert
, and
S.
Schaal
, “
Robotics and neuroscience
,”
Curr. Biol.
24
,
R910
R920
(
2014
).
126.
Z.
Bing
,
C.
Meschede
,
F.
Röhrbein
,
K.
Huang
, and
A. C.
Knoll
, “
A survey of robotics control based on learning-inspired spiking neural networks
,”
Front. Neurorobotics
12
,
35
(
2018
).
127.
E.
Grinke
,
C.
Tetzlaff
,
F.
Wörgötter
, and
P.
Manoonpong
, “
Synaptic plasticity in a recurrent neural network for versatile and adaptive behaviors of a walking robot
,”
Front. Neurorobotics
9
,
11
(
2015
).
128.
J.
Kaiser
,
M.
Hoff
,
A.
Konle
,
J. C.
Vasquez Tieck
,
D.
Kappel
,
D.
Reichard
,
A.
Subramoney
,
R.
Legenstein
,
A.
Roennau
,
W.
Maass
, and
R.
Dillmann
, “
Embodied synaptic plasticity with online reinforcement learning
,”
Front. Neurorobotics
13
,
81
(
2019
).
129.
S.
Schmidgall
and
J.
Hays
, “
Synaptic motor adaptation: A three-factor learning rule for adaptive robotic control in spiking neural networks
,” in
Proceedings of the 2023 International Conference on Neuromorphic Systems, ICONS’23
(
Association for Computing Machinery
,
New York, NY
,
2023
).
130.
P.
Arena
,
S.
De Fiore
,
L.
Patané
,
M.
Pollino
, and
C.
Ventura
, “
Insect inspired unsupervised learning for tactic and phobic behavior enhancement in a hybrid robot
,” in
2010 International Joint Conference on Neural Networks (IJCNN)
(
IEEE
,
2010
), pp.
1
8
.
131.
D.
Hu
,
X.
Zhang
,
Z.
Xu
,
S.
Ferrari
, and
P.
Mazumder
, “
Digital implementation of a spiking neural network (SNN) capable of spike-timing-dependent plasticity (STDP) learning
,” in
14th IEEE International Conference on Nanotechnology
(
IEEE
,
2014
), pp.
873
876
.
132.
X.
Wang
,
Z.-G.
Hou
,
F.
Lv
,
M.
Tan
, and
Y.
Wang
, “
Mobile robots modular navigation controller using spiking neural networks
,”
Neurocomputing
134
,
230
238
(
2014
).
133.
S. A.
Neymotin
,
G. L.
Chadderdon
,
C. C.
Kerr
,
J. T.
Francis
, and
W. W.
Lytton
, “
Reinforcement learning of two-joint virtual arm reaching in a computer model of sensorimotor cortex
,”
Neural Comput.
25
,
3263
3293
(
2013
).
134.
S.
Dura-Bernal
,
X.
Zhou
,
S. A.
Neymotin
,
A.
Przekwas
,
J. T.
Francis
, and
W. W.
Lytton
, “
Cortical spiking network interfaced with virtual musculoskeletal arm and robotic arm
,”
Front. Neurorobotics
9
,
13
(
2015
).
135.
W.
Ilg
and
K.
Berns
, “
A learning architecture based on reinforcement learning for adaptive control of the walking machine LAURON
,”
Robot. Auton. Syst.
15
,
321
334
(
1995
).
136.
A. J.
Ijspeert
, “
Biorobotics: Using robots to emulate and investigate agile locomotion
,”
Science
346
,
196
203
(
2014
).
137.
F.
Faghihi
,
A. A.
Moustafa
,
R.
Heinrich
, and
F.
Wörgötter
, “
A computational model of conditioning inspired by Drosophila olfactory system
,”
Neural Networks
87
,
96
108
(
2017
).
138.
N. S.
Szczecinski
,
C.
Goldsmith
,
W.
Nourse
, and
R. D.
Quinn
, “
A perspective on the neuromorphic control of legged locomotion in past, present, and future insect-like robots
,”
Neuromorphic Comput. Eng.
3
,
023001
(
2023
).
139.
M.
Botvinick
,
J. X.
Wang
,
W.
Dabney
,
K. J.
Miller
, and
Z.
Kurth-Nelson
, “
Deep reinforcement learning and its neuroscientific implications
,”
Neuron
107
,
603
616
(
2020
).
140.
K.
Arulkumaran
,
M. P.
Deisenroth
,
M.
Brundage
, and
A. A.
Bharath
, “
A brief survey of deep reinforcement learning
,” arXiv:1708.05866 (
2017
).
141.
V.
Mnih
,
K.
Kavukcuoglu
,
D.
Silver
,
A. A.
Rusu
,
J.
Veness
,
M. G.
Bellemare
,
A.
Graves
,
M.
Riedmiller
,
A. K.
Fidjeland
,
G.
Ostrovski
et al, “
Human-level control through deep reinforcement learning
,”
Nature
518
,
529
533
(
2015
).
142.
M.
Watabe-Uchida
,
N.
Eshel
, and
N.
Uchida
, “
Neural circuitry of reward prediction error
,”
Annu. Rev. Neurosci.
40
,
373
394
(
2017
).
143.
L. P.
Kaelbling
,
M. L.
Littman
, and
A. W.
Moore
, “
Reinforcement learning: A survey
,”
J. Artif. Intell. Res.
4
,
237
285
(
1996
).
144.
R. S.
Sutton
and
A. G.
Barto
,
Reinforcement Learning: An Introduction
(
MIT Press
,
2018
).
145.
X. B.
Peng
,
P.
Abbeel
,
S.
Levine
, and
M.
Van de Panne
, “
DeepMimic: Example-guided deep reinforcement learning of physics-based character skills
,”
ACM Trans. Graphics
37
,
1
14
(
2018
).
146.
R.
Lowe
,
Y. I.
Wu
,
A.
Tamar
,
J.
Harb
,
O.
Pieter Abbeel
, and
I.
Mordatch
, “
Multi-agent actor-critic for mixed cooperative-competitive environments
,” in Advances in Neural Information Processing Systems 30 (Curran Associates,
2017
).
147.
C.
La Rosa
,
R.
Parolisi
, and
L.
Bonfanti
, “
Brain structural plasticity: From adult neurogenesis to immature neurons
,”
Front. Neurosci.
14
,
75
(
2020
).
148.
T.
Lesort
,
V.
Lomonaco
,
A.
Stoian
,
D.
Maltoni
,
D.
Filliat
, and
N.
Díaz-Rodríguez
, “
Continual learning for robotics: Definition, framework, learning strategies, opportunities and challenges
,”
Inf. Fusion
58
,
52
68
(
2020
).
149.
K.
Shaheen
,
M. A.
Hanif
,
O.
Hasan
, and
M.
Shafique
, “
Continual learning for real-world autonomous systems: Algorithms, challenges and frameworks
,”
J. Intell. Robot. Syst.
105
,
9
(
2022
).
150.
A.
Banino
,
C.
Barry
,
B.
Uria
,
C.
Blundell
,
T.
Lillicrap
,
P.
Mirowski
,
A.
Pritzel
,
M. J.
Chadwick
,
T.
Degris
,
J.
Modayil
et al, “
Vector-based navigation using grid-like representations in artificial agents
,”
Nature
557
,
429
433
(
2018
).
151.
C. J.
Cueva
and
X.-X.
Wei
, “
Emergence of grid-like representations by training recurrent neural networks to perform spatial localization
,” arXiv:1803.07770 (
2018
).
152.
Y.
Gao
, “
A computational model of learning flexible navigation in a maze by layout-conforming replay of place cells
,”
Front. Comput. Neurosci.
17
,
1053097
(
2023
).
153.
M.
Schrimpf
,
J.
Kubilius
,
H.
Hong
,
N. J.
Majaj
,
R.
Rajalingham
,
E. B.
Issa
,
K.
Kar
,
P.
Bashivan
,
J.
Prescott-Roy
,
F.
Geiger
et al, “
Brain-score: Which artificial neural network for object recognition is most brain-like?
,” bioRxiv:407007 (
2018
).
154.
C.
Zhuang
,
S.
Yan
,
A.
Nayebi
,
M.
Schrimpf
,
M. C.
Frank
,
J. J.
DiCarlo
, and
D. L. K.
Yamins
, “
Unsupervised neural network models of the ventral visual stream
,”
Proc. Natl. Acad. Sci. U. S. A.
118
,
e2014196118
(
2021
).
155.
G.
Jacob
,
R.
Pramod
,
H.
Katti
, and
S.
Arun
, “
Qualitative similarities and differences in visual object representations between brains and deep networks
,”
Nat. Commun.
12
,
1872
(
2021
).
156.
A.
Doerig
,
R.
Sommers
,
K.
Seeliger
,
B.
Richards
,
J.
Ismael
,
G.
Lindsay
,
K.
Kording
,
T.
Konkle
,
M. A.
Van Gerven
,
N.
Kriegeskorte
et al, “
The neuroconnectionist research programme
,” arXiv:2209.03718 (
2022
).
157.
D.
Hassabis
,
D.
Kumaran
,
C.
Summerfield
, and
M.
Botvinick
, “
Neuroscience-inspired artificial intelligence
,”
Neuron
95
,
245
258
(
2017
).