Neural systems are well known for their ability to learn and store information as memories. Even more impressive is their ability to abstract these memories to create complex internal representations, enabling advanced functions such as the spatial manipulation of mental representations. While recurrent neural networks (RNNs) are capable of representing complex information, the exact mechanisms of how dynamical neural systems perform abstraction are still not well-understood, thereby hindering the development of more advanced functions. Here, we train a 1000-neuron RNN—a reservoir computer (RC)—to abstract a continuous dynamical attractor memory from isolated examples of dynamical attractor memories. Furthermore, we explain the abstraction mechanism with a new theory. By training the RC on isolated and shifted examples of either stable limit cycles or chaotic Lorenz attractors, the RC learns a continuum of attractors as quantified by an extra Lyapunov exponent equal to zero. We propose a theoretical mechanism of this abstraction by combining ideas from differentiable generalized synchronization and feedback dynamics. Our results quantify abstraction in simple neural systems, enabling us to design artificial RNNs for abstraction and leading us toward a neural basis of abstraction.

Neural systems learn and store information as memories and can even create abstract representations from these memories, such as how the human brain can change the pitch of a song or predict different trajectories of a moving object. Because we do not know how neurons work together to generate abstractions, we are unable to optimize artificial neural networks for abstraction or directly measure abstraction in biological neural networks. Our goal is to provide a theory for how a simple neural network learns to abstract information from its inputs. We demonstrate that abstraction is possible using a simple neural network and that abstraction can be quantified and measured using existing tools. Furthermore, we provide a new mathematical mechanism for abstraction in artificial neural networks, allowing for future research applications in neuroscience and machine learning.

Biological and artificial neural networks have the ability to make generalizations from only a few examples.1–7 For instance, both types of networks demonstrate object invariance: the ability to recognize an object even after it has undergone translation or transformation.8,9 What is surprising about this invariance is not that neural systems can map a set of inputs to the same output. Rather, what is surprising is that they can first sustain internal representations of objects and then abstract these representations to include translations and transformations. Hence, beyond simply memorizing static, discrete examples,10 neural systems have the ability to abstract their memories along a continuum of information by observing isolated examples.11 However, the precise mechanisms of such abstraction remain unknown, limiting the principled design and training of neural systems.

To make matters worse, much of the information represented by neural networks is not static but dynamic. As a biological example, a songbird’s representation of song is inherently time-varying and can be continuously sped up and slowed down through external perturbations.12 In artificial networks, recurrent neural networks (RNNs) can store a history of temporal information such as language,13 dynamical trajectories,14,15 and climate16 to more accurately classify and predict future events. To harness the power of RNNs for processing temporal information, efforts have focused on developing powerful training algorithms, such as backpropagation through time (BPTT)17 and neural architectures such as long short-term memory (LSTM) networks,18 alongside physical realizations in neuromorphic computing chips.19 Unfortunately, the dramatic increase in computational capability is accompanied by a similarly dramatic increase in the difficulty of understanding such systems, severely limiting their designability and generalizability beyond specific datasets.

To better understand the mechanisms behind neural representations of temporal information, the field has turned to dynamical systems. Starting with theories of synchronization between coupled dynamical systems,20,21 theories of generalized synchronization22 and invertible generalized synchronization23 provide intuition and conditions for when a neural network uniquely represents the temporal trajectory of its inputs, and when this representation can recover the original inputs to recurrently store them as memories.24 These theories hinge on important ideas and tools such as delay embedding,25 Lyapunov exponents (LEs),26 and dimensionality,27,28 which quantify crucial properties of time-varying representations. However, it is not yet known precisely how neural systems abstract such time-varying representations. Accordingly, the field is limited in its understanding of abstraction and meta-learning in existing neural systems29–32 and restricted in its ability to design neural systems for abstraction.

Here, we address this knowledge gap by providing a mechanism for the abstraction of time-varying attractor memories in a reservoir computer (RC), which is a type of RNN.33 In this work, we define abstraction as the process of the RC using its internal nonlinear dynamics to map several low-dimensional inputs to a single higher dimensional output. First, we demonstrate that a neural network can observe low-dimensional inputs and create higher-dimensional abstractions, thereby learning a continuum of representations from a few examples. Then, we develop a new theory to explain the mechanism of this abstraction by extending prior work:34 we explicitly write the differential response of the RC to a differential change in the input, thereby giving a quantitative form to ideas of differentiable generalized synchronization.35 We quantify this abstraction by demonstrating that successful abstraction is driven by the acquisition of an additional 0 Lyapunov exponent in the RC’s dynamics and study the role of the RC’s spectral radius and time constant on its ability to abstract dynamics. These results enable the development of more interpretable and designable methods in machine learning and provide a quantitative hypothesis and measure of abstraction from neural dynamics.

To study the ability of neural networks to process and represent time-varying information as memories, we use a simple nonlinear dynamical system from reservoir computing

(1)

Here, r(t)RN×1 is a vector that represents the state of the N reservoir neurons, x(t)Rk×1 is the vector of k inputs into the reservoir, dRN×1 is a constant vector of bias terms, ARN×N is the matrix of connections between neurons, BRN×k is the matrix of weights mapping inputs to neurons, g is a sigmoidal function that we take to be the hyperbolic tangent tanh, and γ is a time constant.

Throughout the results, we use an N=1000-neuron network such that AR1000×1000. We set A to be 2% dense, where each non-zero entry of A is a random number from 1 to 1, and then scale A such that the absolute value of the largest eigenvalue is ρ, the spectral radius of the network. The spectral radius controls the excitability of the reservoir, as the magnitude of the largest magnitude eigenvalue of A is set to be ρ using the MATLAB command A=A/abs(eigs(A,1,LM))ρ. In general, each entry of B was drawn randomly from 1 to 1 and multiplied by a scalar coefficient set to 0.1; one analysis that stands as an exception is the parameter sweep in Subsection. IV A, where the scalar coefficient was varied systematically. Each entry of the bias term d was drawn randomly from 1 to 1 and multiplied by a bias amplification constant, which was set to 10 in all cases except in the parameter sweep in Subsection IV A.

To study the ability of the reservoir to form representations and abstractions of temporal data, we must define the data to be learned. Following prior work in teaching reservoirs to represent temporal information,14,15,24,34 we will use dynamical attractors as the memories. The first memory that we use is a stable limit cycle that evolves according to

(2)

To test the reservoir’s ability to learn and abstract more complex memories, the second memory that we use is the chaotic Lorenz attractor36 that evolves according to

(3)

By driving the reservoir in Eq. (1) with the time series generated from either the stable limit cycle in Eq. (2) or the chaotic Lorenz system in Eq. (3), the response of the reservoir neurons is given by r(t). In our experiments, we drive the reservoir and evolve the input memory for 50 time units to create a transient phase which we discard, allowing the RC and the input memory to evolve far enough away from the randomly chosen initial conditions. Then, we drive the reservoir and the input memory together for 100 time units to create a learning phase. Because we use a time step of dt=0.001, this process creates a learning phase time series of 1 00 000 points.

To store the attractor time series as memories, prior work in reservoir computing has demonstrated that it is sufficient to first train an output matrix W that maps reservoir states r(t) to copy the input x(t) according to the least squares norm minimization

(4)

and then perform feedback by replacing the inputs x(t) with the output of the reservoir, Wr(t). This feedback generates a new system that evolves autonomously according to

(5)

We evolve this new system for 500 time units to create a prediction phase.

As a demonstration of this process, we show a schematic of the reservoir being driven by the stable limit cycle input [Fig. 1(a), blue], thereby generating the reservoir time series [Fig. 1(a), gold], which is subsequently used to train a matrix W such that Wr(t) copies the input [Fig. 1(a), red]. The training input, x(t) [Fig. 1(b), blue], and the training output, Wr(t) [Fig. 1(b), red], are plotted together and are indistinguishable. After the training, we perform feedback by replacing the reservoir inputs, x(t), with the outputs, Wr(t) [Fig. 1(c)], and observe that the output of the autonomous reservoir remains as a limit cycle [Fig. 1(d)]. Can this simple process be used not only to store memories but also to abstract memories? If so, by what mechanism?

FIG. 1.

Schematic of a reservoir computer learning a limit cycle memory. (a) Time series of a limit cycle that drives the reservoir to the state of the limit cycle. Weighted sums of the reservoir states are trained to reproduce the original time series (b) by creating the W matrix. (c) The reservoir uses the weighted sums in W to evolve, closing the feedback loop in the RC. (d) The RC now evolves autonomously along a trajectory that closely follows the expected dynamics of the original limit cycle. Here, color represents time.

FIG. 1.

Schematic of a reservoir computer learning a limit cycle memory. (a) Time series of a limit cycle that drives the reservoir to the state of the limit cycle. Weighted sums of the reservoir states are trained to reproduce the original time series (b) by creating the W matrix. (c) The reservoir uses the weighted sums in W to evolve, closing the feedback loop in the RC. (d) The RC now evolves autonomously along a trajectory that closely follows the expected dynamics of the original limit cycle. Here, color represents time.

Close modal

In what follows, we answer these questions by extending the framework to multiple isolated inputs. Specifically, rather than using only one attractor time series x0(t), we will use a finite number of translated attractor time series

(6)

for cZ, where a is a constant vector. Further discussion about the vector a can be found in the  Appendix. We will use these time series to drive the reservoir to generate a finite number of neural responses rc(t). By concatenating all of the inputs and reservoir states along the time dimension into a single time series, x(t) and r(t), respectively, we train an output matrix W according to Eq. (4) that maps all of the reservoir states to all of the translated inputs. Finally, using W, we perform feedback according to Eq. (5).

To teach the reservoir to generate higher-dimensional representations of isolated inputs, we train it to copy translations of an attractor memory. First, we consider the time series of a stable limit cycle generated by Eq. (2), x0(t), and we create shifted time series, xc(t), for c{2,1,0,1,2} according to Eq. (6) [Fig. 2(a)]. We then use these time series to drive the reservoir according to Eq. (1) to generate the reservoir time series rc(t) for c{2,1,0,1,2}, concatenate the time series into x(t) and r(t), respectively, and train the output matrix W according to Eq. (4) to generate the autonomous feedback reservoir that evolves according to Eq. (5).

To test whether the reservoir has learned a higher-dimensional continuum of limit cycles vs the five isolated examples, we evolve the autonomous reservoir at intermediate values of the translation variable c. Specifically, we first prepare the reservoir state by driving the non-autonomous reservoir in Eq. (1) with limit cycles at intermediate translations [i.e., xc(t) for c{1.9,1.8,1.7,,1.7,1.8,1.9}] for 50 time units until any transient dynamics from the initial reservoir state have decayed, thereby generating a set of final reservoir states rc(t=50). We then use these final reservoir states as the initial state for the autonomous feedback reservoir in Eq. (5). Finally, we evolve the autonomous reservoir and plot the outputs in green in Fig. 2(b). As can be seen, the autonomous reservoir whose initial state has been prepared at intermediary translations in position continues to evolve about a stable limit cycle at that shift.

FIG. 2.

Successful abstraction in learning a continuous limit cycle memory. (a) Five shifted limit cycles are learned by the reservoir as five isolated examples. (b) 2D plot of the predicted output of the autonomous reservoir whose initial state has been prepared between the five training examples. The shift magnitude, or the distance of each translation, of the initial state is colored from green to black. (c) To visualize the abstraction that occurred in an additional dimension along the direction of the translation, we show a 3D plot of the predicted reservoir time series projected onto the Δr vector in Eq. (11), and the first two principal components after removing the Δr projection. The cutout highlights the “height” of the continuous attractor, formed from the abstraction along the Δr axis.

FIG. 2.

Successful abstraction in learning a continuous limit cycle memory. (a) Five shifted limit cycles are learned by the reservoir as five isolated examples. (b) 2D plot of the predicted output of the autonomous reservoir whose initial state has been prepared between the five training examples. The shift magnitude, or the distance of each translation, of the initial state is colored from green to black. (c) To visualize the abstraction that occurred in an additional dimension along the direction of the translation, we show a 3D plot of the predicted reservoir time series projected onto the Δr vector in Eq. (11), and the first two principal components after removing the Δr projection. The cutout highlights the “height” of the continuous attractor, formed from the abstraction along the Δr axis.

Close modal

Now that we have numerically demonstrated the higher-dimensional abstraction of lower-dimensional attractors, we will uncover the underlying theoretical mechanism first by studying the response of the reservoir to different inputs and then by studying the consequence of the training process.

First, we compute perturbations of the reservoir state, dr(t), in response to perturbations of the input, dx(t), by linearizing the dynamics about the trajectories r0(t) and x0(t) to yield

(7)

where dg(t) is the derivative of tanh() evaluated at Ar0(t)+Bx0(t), and ° is the element-wise product of the ith element of dg(t) and the ith row of either matrix A or B. We are guaranteed by differentiable generalized synchronization35 that if dx(t) is infinitesimal and constant, then dr(t) is also infinitesimal and evolves according to Eq. (7). Fortuitously, the differential change dx(t) is precisely infinitesimal and constant and is given by the derivative of Eq. (6) to yield dx(t)=adc. We substitute this derivative into Eq. (7) to yield

(8)

Crucially, this system is linear such that if a shift of adc yields a perturbed reservoir trajectory of dr(t), then a shift of 2adc yields a perturbed reservoir trajectory of 2dr(t). Hence, we can already begin to see the mechanism of abstraction: any scalar multiple of the differential input, adc, yields a scalar multiple of the trajectory dr(t) as a valid perturbed trajectory.

To complete the abstraction mechanism, we note that the trained output matrix precisely learns the inverse map. If Eq. (8) maps scalar multiples of adc to scalar multiples of dr(t), then the trained output matrix W maps scalar multiples of dr(t) back to adc. To learn this inverse map, notice that our five training examples are spaced closely together [Fig. 2(a)], which allows the trained output matrix W to map differential changes in r(t) to differential changes in x(t). Hence, not only does Wrc(t)xc(t) but W also learns

(9)

The consequence of this differential learning is seen in the evolution of the perturbation dr(t) of the autonomous feedback reservoir by substituting Eq. (9) into Eq. (8) to obtain

(10)

If the training examples are close enough to learn the differential relation in Eq. (9), then any perturbed trajectory, dr(t), generated by Eq. (8) is a valid trajectory in the feedback system to linear order. Further, any scalar multiple of dr(t) is also a valid perturbed trajectory in the feedback system.

Hence, by training the output matrix to copy nearby examples—thereby learning the differential relation between dr(t) and dx(t)—we encode scalar multiples of dr(t) as a linear subspace of valid perturbation trajectories. It is precisely this encoded subspace of valid perturbation trajectories that we call the higher-dimensional abstraction of the lower-dimensional input; in addition to the two-dimensional limit cycle input, the reservoir encodes the subspace comprising scalar multiples of the perturbation trajectory as a third dimension. To visually represent this third dimension, we take the average of the perturbation vector across time as

(11)

and project all of the autonomous reservoir trajectories along this vector. We then remove the projection of Δr from the autonomous reservoir time series, compute the first two principle components of this modified trajectory, and plot the projection against these two principle components, shown in Fig. 2(c). As can be seen, the shift in the limit cycle is encoded along the Δr direction. Graphically and numerically, we have confirmed our theoretical mechanism of abstraction using a continuous limit cycle memory.

Now that we have a mechanism of abstraction, we seek a simple method to quantify this abstraction in higher-dimensional systems that do not permit an intuitive graphical representation [Fig. 2(c)]. In the chaotic Lorenz system with a fractal orbit [Fig. 3(a)], it can be difficult to visually determine whether the prediction output is part of a given input example or in between two input examples. Hence, we would like some measure of the presence of perturbations along a trajectory, dr(t), that neither grow nor shrink along the direction of linearly scaled perturbations. If these perturbations neither grow nor shrink, they represent a stable trajectory that does not collapse into another trajectory or devolves into chaos.

To measure this abstraction, we compute the Lyapunov spectrum of the RC. Conceptually, the Lyapunov spectrum measures the stability of different trajectories along an attractor. It is computed by first generating an orbit along the attractor of a k-dimensional dynamical system, x(t), then by evaluating the Jacobian at every point along the orbit, J(x(t)), and finally, by evolving orbits of infinitesimal perturbations, pi(t), along the time-varying Jacobian as

(12)

Along these orbits, the direction and magnitude of pi(t) will change based on the linearly stable and unstable directions of the Jacobian J(x(t)). To capture these changes along orthogonal directions, after each time step of evolution along Eq. (12), we order the perturbation vectors into a matrix [p1(t),p2(t),,pk(t)], and perform a Gram–Schmidt orthonormalization to obtain an orthonormal basis of perturbation vectors p~1(t),p~2(t),,p~k(t). In this way, p~1(t) eventually points along the least stable direction, p~2(t) along the second least stable direction, and p~k(t) along the most stable direction. The evolution of the three perturbation vectors of the Lorenz system is shown in Fig. 3(a).

FIG. 3.

Obtaining a Lyapunov spectrum from a Lorenz attractor. (a) A 3D plot of the Lyapunov perturbation orbits (colored), which check the stability of a trajectory, about the Lorenz attractor (black, gray), obtained by evolving the orbits about the Jacobian of the Lorenz system evaluated at each point, followed by an orthonormalization. (b) A plot of the three Lyapunov exponents over 30 time units for the Lorenz system, whose average over T=100 time units yields the estimated Lyapunov exponents.

FIG. 3.

Obtaining a Lyapunov spectrum from a Lorenz attractor. (a) A 3D plot of the Lyapunov perturbation orbits (colored), which check the stability of a trajectory, about the Lorenz attractor (black, gray), obtained by evolving the orbits about the Jacobian of the Lorenz system evaluated at each point, followed by an orthonormalization. (b) A plot of the three Lyapunov exponents over 30 time units for the Lorenz system, whose average over T=100 time units yields the estimated Lyapunov exponents.

Close modal

To calculate the Lyapunov spectrum, we compute the projection of each normalized perturbation vector along the Jacobian as the Lyapunov exponent (LE) over time,37 

(13)

and the final LE is given by the time average [Fig. 3(b)]. Every continuous-time dynamical system with bounded, non-fixed-point dynamics has at least one zero Lyapunov exponent corresponding to a perturbation that neither grows nor shrinks on average [Fig. 3(b), red]. In a chaotic system like the Lorenz, there is also a positive LE corresponding to an orthogonal perturbation that grows on average [Fig. 3(b), blue]. Finally, a negative LE corresponds to an orthogonal perturbation that decays on average [Fig. 3(b), yellow]. As can be seen in the plot of trajectories, the orbit of the negative LE is directed transverse to the plane that roughly defines the “wings” of the attractor such that any deviation from the plane of the wings quickly collapses back onto the wings [Fig. 3(a)].

Using the Lyapunov spectrum, we hypothesize that the reservoir’s abstraction of an attractor memory will appear as an additional LE equal to zero. This is because through the training of nearby examples, the reservoir acquires the perturbation direction dr(t) that neither grows nor decays on average, as all scalar multiples of dr(t) are valid perturbation trajectories to linear order according to Eq. (10). Hence, the acquisition of such a perturbation direction that neither grows nor decays should present itself as an additional LE equal to zero.

With this mechanism of abstraction in RCs, we provide a concrete implementation of our theory and study its limits. Our RCs in Eq. (1) and in Eq. (5) depend on several parameter regimes; the spectral radius (given by ρ), the time constant γ, the bias term d, the weighting of the input matrix B, and the number and spacing of the training examples all impact whether abstraction can successfully occur. We quantify the effect of varying these parameters on the RC’s ability to abstract different inputs via the Lyapunov spectrum analysis.

We focus on the parameters that determine the internal dynamics of the RC: γ and ρ. The RC is a carefully balanced system whose internal speed is set by γ. If γ is too small, the system is too slow to react to the inputs. Conversely, if γ is too large, the system responds too quickly to retain a history of the input. Thus, we hypothesize that an intermediate γ will yield optimal abstraction and that the optimal range of γ will vary depending upon the time scale of the input. Similar to the time constant, the spectral radius is known to impact the success of the learning as it controls the excitability of the RC.15 For abstraction to succeed, the RC needs an intermediate γ and ρ to learn the input signals with an excitability and reaction speed suited for the input attractor memory.

To find the ideal parameter regime for abstracting a limit cycle attractor memory, we performed a parameter sweep on γ from 2.5 to 25.0 in increments of 2.5 and on ρ from 0.2 to 2.0 in increments of 0.2. All other parameters in the closed and open loop reservoir equations were held constant. To measure the success of the abstraction, we calculated the first four LEs of the RC, looking for values of λ1 and λ2 equal to 0, and values of λ3 and λ4 that are negative. For this continuous limit cycle memory, we found that the best parameter regime was 0.4<ρ<1.0 and 20<γ<25 (Fig. 4). Then, we tested the weighting of the input matrix, B, while holding ρ, γ, and all other parameters constant. We found that an optimal scaling of B is between 0.001 and 0.1. We performed a similar test for the bias term, d, resulting in an optimal scaling between 1.0 and 20.0. These parameter ranges demonstrate that a careful balance of ρ and γ, along with B and d, is necessary to successfully achieve abstraction. More generally, our approach to defining these parameter ranges provides a principled method for future RCs to learn different attractor memories.

FIG. 4.

LEs of an RC after learning a continuous limit cycle. Heat maps of the first four LEs of the RC with different spectral radii (ρ) on the x axis and time constants (γ) on the y axis.

FIG. 4.

LEs of an RC after learning a continuous limit cycle. Heat maps of the first four LEs of the RC with different spectral radii (ρ) on the x axis and time constants (γ) on the y axis.

Close modal

While limit cycles provide an intuitive conceptual demonstration of abstraction, real neural networks such as the human brain learn more complex memories that involve a larger number of parameters, including natural and chaotic attractors such as weather phenomena36 or diffusion.38 Chaotic attractors pose a more complex memory for the reservoir to learn, so it is nontrivial to show that the reservoir is able to abstract from several chaotic attractors to learn one single continuous chaotic attractor. By again analyzing the Lyapunov spectrum, we can quantify successful abstraction. As seen in Fig. 3, a chaotic dynamical system is characterized by positive Lyapunov exponents. In the case of the Lorenz attractor, the first Lyapunov exponent is positive (0.9), the second is equal to zero, and the third is negative (14.6). Hence, when the RC learns a single Lorenz attractor, the first LE is positive, the second LE is zero, and the rest of the spectrum is increasingly negative. In the case of the successful and continuous abstraction of the Lorenz attractor, we expect to see that the first LE is positive, followed by not one, but two LEs equal to zero, followed by increasingly negative LEs.

To test this acquisition of an LE equal to zero, we trained the RC to learn multiple chaotic attractor memories, focusing on the Lorenz attractor. To find the ideal parameter regime for learning a continuous Lorenz attractor memory from many discrete examples, we again performed a parameter sweep on γ from 2.5 to 25.0 in increments of 2.5 and on ρ from 0.2 to 2.0 in increments of 0.2. All other parameters in the closed and open loop reservoir equations were held constant. To measure the success of abstraction, we calculated the first five LEs of the RC, looking for the values of λ1 to be positive, values of λ2 and λ3 to equal zero, and the values of λ4 and λ5 to be increasingly negative. For this continuous Lorenz attractor memory, we found that the best parameter regime was using 0.6–1.2 for ρ and a γ of 25, as seen in Fig. 5. Hence, we demonstrate that in addition to simple limit cycle attractors, RCs can successfully abstract much more complex and unstable chaotic attractor memories, demonstrating the generalizability of our theory.

FIG. 5.

LEs of a RC after learning a continuous Lorenz attractor. Heat maps of the first five LEs of the RC with different spectral radii (ρ) on the x axis and time constants (γ) on the y axis.

FIG. 5.

LEs of a RC after learning a continuous Lorenz attractor. Heat maps of the first five LEs of the RC with different spectral radii (ρ) on the x axis and time constants (γ) on the y axis.

Close modal

Reservoir computing has been gaining substantial traction, and significant advances have been made in many domains of application. Among them include numerical advances in adaptive rules for training reservoirs using evolutionary algorithms39,40 and neurobiologically inspired spike-time-dependent-plasticity.41 In tandem, physical implementations of reservoir computing in photonic,42–44 memristive,45 and neuromorphic46 systems provide low-power alternatives to traditional computing hardware. Each application is accompanied by its own unique set of theoretical considerations and limitations,47 thereby emphasizing the need for the underlying analytical mechanisms to make meaningful generalizations across such a wide range of systems.

In this work, we provide such a mechanism for the abstraction of a continuum of attractor memories from discrete examples and put forth the acquisition of an additional zero Lyapunov exponent as a quantitative measure of success. Compared to prior work,34 we remove the external control parameter, thereby moving beyond the translation of a k-dimensional attractor to generating a higher, k+1-dimensional representation of the attractor from k-dimensional inputs. Moreover, the method can be applied to any learning of chaotic attractor memories due to the generality of the differential mechanism of learning we uncover. While our investigation simplifies the complexity of the network used and the memories learned, we show that the underlying mechanism of abstraction remains the same as we increase the complexity of the memory learned (e.g., discrete to continuous and non-chaotic to chaotic).

Our work motivates several new avenues of inquiry. First, it would be of interest to examine the theoretical and numerical mechanism for abstracting more complex transformations. Second, it would be of interest to embark on a systematic study of the spacing between the discrete examples that is necessary to learn a differential attractor vs discrete attractors and the phase transition of abstraction. Third and finally, ongoing and future efforts could seek to determine the role of noise in both the RC and input dynamics for abstracting high-dimensional continuous attractors from scattered low-dimensional and discrete attractors. Because different RNNs are better suited to learn and abstract different inputs, we expect that this work will shed light on studies that reveal how one can design specialized RNNs for better abstraction on particular dynamical attractors.

Here, we show that an RC can successfully learn time-varying attractor memories. We demonstrate this process with both limit cycle and Lorenz attractor inputs. We then show the RC several discrete examples of these attractors, translated from each other by a small distance. We find that the neural network is able to abstract to a higher dimension and learn a continuous attractor memory that connects all of the discrete examples together. This process of abstraction can be quantified by the acquisition of an additional exponent equal to zero in the Lyapunov spectrum of the RC’s dynamics. Our discovery has important implications for future improvements in the algorithms and methods used in machine learning, due specifically to the understanding gained from using this simpler model. More broadly, our findings provide new hypotheses regarding how humans construct abstractions from real-world inputs to their neural networks.

L.M.S. acknowledges support from the University Scholars Program at the University of Pennsylvania. J.Z.K. acknowledges support from the NIH (No. T32-EB020087), PD: Felix W. Wehrli, and the National Science Foundation Graduate Research Fellowship (No. DGE-1321851). D.S.B. acknowledges support from the NSF through the University of Pennsylvania Materials Research Science and Engineering Center (MRSEC) (No. DMR-1720530), as well as the Paul G. Allen Family Foundation, and a grant from the Army Research Office (No. W911NF-16-1-0474). The content is solely the responsibility of the authors and does not necessarily represent the official views of any of the funding agencies.

The authors have no conflicts of interest to disclose.

The data that support the findings of this study are available from the corresponding author upon reasonable request.

We would like to include a citation diversity statement following a recent proposal.48 Recent work in several fields of science has identified a bias in citation practices such that papers from women and other minority scholars are under-cited relative to the number of such papers in the field.49–57 Here, we sought to proactively consider choosing references that reflect the diversity of the field in thought, form of contribution, gender, race, ethnicity, and other factors. First, we obtained the predicted gender of the first and last author of each reference by using databases that store the probability of a first name being carried by a woman.53,58 By this measure (and excluding self-citations to the first and last authors of our current paper), our references contain 6.67% woman(first)/woman(last), 16.4% man/woman, 13.33% woman/man, and 63.6% man/man. This method is limited in that (a) names, pronouns, and social media profiles used to construct the databases may not, in every case, be indicative of gender identity and (b) it cannot account for intersex, non-binary, or transgender people. Second, we obtained predicted racial/ethnic category of the first and last author of each reference by databases that store the probability of a first and last name being carried by an author of color.59,60 By this measure (and excluding self-citations), our references contain 15.94% author of color (first)/author of color (last), 20.06% white author/author of color, 20.17% author of color/white author, and 43.83% white author/white author. This method is limited in that (a) names and Florida Voter Data to make the predictions may not be indicative of racial/ethnic identity, and (b) it cannot account for Indigenous and mixed-race authors, or those who may face differential biases due to the ambiguous racialization or ethnicization of their names. We look forward to future work that could help us to better understand how to support equitable practices in science.

In this work, we used a shift vector a in Eq. (6) to control where the translated examples that were shown to the RC were located. We used a magnitude of 0.001 for the a vector in order to produce both continuous LC and Lorenz attractors for Figs. 4 and 5. For Fig. 2, we used a magnitude of 0.01 for the continuous LC attractor. For the continuous LC, the direction of a was along the x^2 = x^1 line, so each example shown to the reservoir was translated by the shift magnitude in both x^1 and x^2 directions. For the continuous Lorenz attractor, a similar procedure was used but extended to the additional third dimension. So, the direction of a was along the x^3=x^2=x^1 line such that each example shown to the reservoir was translated by the shift magnitude in the x^1, x^2, and x^3 directions.

Figure 6 shows the relationship between the magnitude of a and the second LE, which for successful abstraction should be as close to 0 as possible. This figure was created from training 21 LC examples each shifted by the shift magnitude indicated with ρ=0.4 and γ=25, and then the LE spectrum was calculated after the prediction phase. We also note that due to the reservoir’s need to learn the differential relationship between multiple training examples, as outlined in the section entitled “Differential mechanism of learning,” the magnitude of a must be small for best results.

FIG. 6.

A comparison of varying magnitudes of the shift vector, a, and the resulting LE 2 for a continuous LC using ρ=0.4 and γ=25.

FIG. 6.

A comparison of varying magnitudes of the shift vector, a, and the resulting LE 2 for a continuous LC using ρ=0.4 and γ=25.

Close modal

Here, in Fig. 7, we show a comparison of different values of the spectral radius at the network sparsity used in this work (i.e., 2% dense) and its effect on the Lyapunov exponents of the reservoir. The time constant was set to γ=25, and the reservoir was simply evolved for 50 time units for a transient period, ensuring that the reservoir evolved far enough away for it to ignore transient orbits caused by the random initial condition. Then, the reservoir was time-evolved for 100 time units, and the Lyapunov exponents were calculated for the reservoir.

FIG. 7.

A comparison of different values of the spectral radius, ρ, on the Lyapunov spectrum of the RC, with the x axis denoting the LE number and the y axis denoting the value of the specific LE.

FIG. 7.

A comparison of different values of the spectral radius, ρ, on the Lyapunov spectrum of the RC, with the x axis denoting the LE number and the y axis denoting the value of the specific LE.

Close modal
1.
Z.
Zhang
,
Y.-Y.
Jiao
, and
Q.-Q.
Sun
, “
Developmental maturation of excitation and inhibition balance in principal neurons across four layers of somatosensory cortex
,”
Neuroscience
174
,
10
25
(
2011
).
2.
R. L.
Faulkner
,
M.-H.
Jang
,
X.-B.
Liu
,
X.
Duan
,
K. A.
Sailor
,
J. Y.
Kim
,
S.
Ge
,
E. G.
Jones
,
G.-L.
Ming
,
H.
Song
, and
H.-J.
Cheng
, “
Development of hippocampal mossy fiber synaptic outputs by new neurons in the adult brain
,”
Proc. Natl. Acad. Sci. U.S.A.
105
,
14157
14162
(
2008
).
3.
F. A.
Dunn
and
R. O. L.
Wong
, “
Diverse strategies engaged in establishing stereotypic wiring patterns among neurons sharing a common input at the visual system’s first synapse
,”
J. Neurosci.
32
,
10306
10317
(
2012
).
4.
F. I.
Craik
and
E.
Bialystok
, “
Cognition through the lifespan: Mechanisms of change
,”
Trends Cogn. Sci.
10
,
131
138
(
2006
).
5.
A.
Tacchetti
,
L.
Isik
, and
T. A.
Poggio
, “
Invariant recognition shapes neural representations of visual input
,”
Annu. Rev. Vis. Sci.
4
,
403
422
(
2018
).
6.
E. I.
Moser
,
E.
Kropff
, and
M.-B.
Moser
, “
Place cells, grid, cells, and the brain’s spatial representation system
,”
Annu. Rev. Neurosci.
31
,
69
89
(
2008
).
7.
P. J.
Ifft
,
S.
Shokur
,
Z.
Li
,
M. A.
Lebedev
, and
M. A. L.
Nicolelis
, “
A brain-machine interface enables bimanual arm movements in monkeys
,”
Sci. Transl. Med.
5
,
210ra154
(
2013
).
8.
R.
Guyonneau
,
H.
Kirchner
, and
S. J.
Thorpe
, “
Animals roll around the clock: The rotation invariance of ultrarapid visual processing
,”
J. Vis.
6
,
1
(
2006
).
9.
X.
Zou
,
Z.
Ji
,
X.
Liu
,
Y.
Mi
,
K. M.
Wong
, and
S.
Wu
, “Learning a continuous attractor neural network from real images,” in International Conference on Neural Information Processing (Springer, 2017), pp. 622–631.
10.
J. J.
Hopfield
, “
Neural networks and physical systems with emergent collective computational abilities
,”
Proc. Natl. Acad. Sci. U.S.A.
79
,
2554
2558
(
1982
).
11.
H. S.
Seung
, “Learning continuous attractors in recurrent networks,” in Advances in Neural Information Processing Systems (Citeseer, 1998), pp. 654–660.
12.
M. S.
Fee
and
C.
Scharff
, “
The songbird as a model for the generation and learning of complex sequential behaviors
,”
ILAR J.
51
,
362
377
(
2010
).
13.
T.
Mikolov
,
M.
Karafiát
,
L.
Burget
,
J.
Černocký
, and
S.
Khudanpur
, “Recurrent neural network based language model,” in Eleventh Annual Conference of the International Speech Communication Association (International Speech Communication Association, 2010).
14.
H.
Jaeger
, “
The ‘echo state’ approach to analysing and training recurrent neural networks-with an erratum note
,”
GMD Report 148
, GMD, German National Research Institute for Computer Science (
2001
).
15.
D.
Sussillo
and
L.
Abbott
, “
Generating coherent patterns of activity from chaotic neural networks
,”
Neuron
63
,
544
557
(
2009
).
16.
B. T.
Nadiga
, “
Reservoir computing as a tool for climate predictability studies
,”
J. Adv. Model. Earth Syst.
13
,
e2020MS002290
(
2021
).
17.
T. P.
Lillicrap
and
A.
Santoro
, “
Backpropagation through time and the brain
,”
Curr. Opin. Neurobiol.
55
,
82
89
(
2019
).
18.
S.
Hochreiter
and
J.
Schmidhuber
, “
Long short-term memory
,”
Neural Comput.
9
,
1735
1780
(
1997
).
19.
S.
Furber
, “
Large-scale neuromorphic computing systems
,”
J. Neural Eng.
13
,
051001
(
2016
).
20.
E. N.
Davison
,
B.
Dey
, and
N. E.
Leonard
, “Synchronization bound for networks of nonlinear oscillators,” in 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton) (IEEE, 2016), pp. 1110–1115.
21.
L. M.
Pecora
and
T. L.
Carroll
, “
Synchronization in chaotic systems
,”
Phys. Rev. Lett.
64
,
821
(
1990
).
22.
N. F.
Rulkov
,
M. M.
Sushchik
,
L. S.
Tsimring
, and
H. D. I.
Abarbanel
, “
Generalized synchronization of chaos in directionally coupled chaotic systems
,”
Phys. Rev. E
51
,
980
994
(
1995
).
23.
Z.
Lu
and
D. S.
Bassett
, “
Invertible generalized synchronization: A putative mechanism for implicit learning in neural systems
,”
Chaos
30
,
063133
(
2020
).
24.
Z.
Lu
,
B. R.
Hunt
, and
E.
Ott
, “
Attractor reconstruction by machine learning
,”
Chaos
28
,
061104
(
2018
).
25.
S. P.
Garcia
and
J. S.
Almeida
, “
Multivariate phase space reconstruction by nearest neighbor embedding with different time delays
,”
Phys. Rev. E
72
,
027205
(
2005
).
26.
S.
Dawson
,
C.
Grebogi
,
T.
Sauer
, and
J. A.
Yorke
, “
Obstructions to shadowing when a Lyapunov exponent fluctuates about zero
,”
Phys. Rev. Lett.
73
,
1927
(
1994
).
27.
L.-S.
Young
, “
Dimension, entropy and Lyapunov exponents
,”
Ergodic Theory Dyn. Syst.
2
,
109
124
(
1982
).
28.
P.
Frederickson
,
J. L.
Kaplan
,
E. D.
Yorke
, and
J. A.
Yorke
, “
The Lyapunov dimension of strange attractors
,”
J. Differ. Equ.
49
,
185
207
(
1983
).
29.
S.
Kumar
,
I.
Dasgupta
,
J. D.
Cohen
,
N. D.
Daw
, and
T. L.
Griffiths
, “Meta-learning of compositional task distributions in humans and machines,” arXiv:2010.02317 [cs.LG] (2020).
30.
N.
Schweighofer
and
K.
Doya
, “
Meta-learning in reinforcement learning
,”
Neural Netw.
16
,
5
9
(
2003
).
31.
R. A.
Santiago
, “Context discerning multifunction networks: Reformulating fixed weight neural networks,” in 2004 IEEE International Joint Conference on Neural Networks (IEEE Cat. No. 04CH37541) (IEEE, 2004), Vol. 1, pp. 189–194.
32.
L. A.
Feldkamp
,
G.
Puskorius
, and
P.
Moore
, “
Adaptive behavior from fixed weight networks
,”
Inform. Sci.
98
,
217
235
(
1997
).
33.
M.
Lukoševičius
and
H.
Jaeger
, “
Reservoir computing approaches to recurrent neural network training
,”
Comput. Sci. Rev.
3
,
127
149
(
2009
).
34.
J. Z.
Kim
,
Z.
Lu
,
E.
Nozari
,
G. J.
Pappas
, and
D. S.
Bassett
, “
Teaching recurrent neural networks to infer global temporal structure from local examples
,”
Nat. Mach. Intell.
3
,
316
323
(
2021
).
35.
B. R.
Hunt
,
E.
Ott
, and
J. A.
Yorke
, “
Differentiable generalized synchronization of chaos
,”
Phys. Rev. E
55
,
4029
(
1997
).
36.
E. N.
Lorenz
, “
Deterministic nonperiodic flow
,”
J. Atmos. Sci.
20
,
130
141
(
1963
).
37.
M.
Balcerzak
,
D.
Pikunov
, and
A.
Dabrowski
, “
The fastest, simplified method of Lyapunov exponents spectrum estimation for continuous-time dynamical systems
,”
Nonlinear Dyn.
94
,
3053
3065
(
2018
).
38.
J.
Pathak
,
Z.
Lu
,
B. R.
Hunt
,
M.
Girvan
, and
E.
Ott
, “
Using machine learning to replicate chaotic attractors and calculate Lyapunov exponents from data
,”
Chaos
27
,
121102
(
2017
).
39.
A. A.
Ferreira
and
T. B.
Ludermir
, “Genetic algorithm for reservoir computing optimization,” in 2009 International Joint Conference on Neural Networks (IEEE, 2009), pp. 811–815.
40.
A. A.
Ferreira
,
T. B.
Ludermir
, and
R. R.
De Aquino
, “
An approach to reservoir computing design and training
,”
Expert Syst. Appl.
40
,
4172
4182
(
2013
).
41.
H.
Paugam-Moisy
,
R.
Martinez
, and
S.
Bengio
, “
Delay learning and polychronization for reservoir computing
,”
Neurocomputing
71
,
1143
1158
(
2008
).
42.
A.
Katumba
,
M.
Freiberger
,
P.
Bienstman
, and
J.
Dambre
, “
A multiple-input strategy to efficient integrated photonic reservoir computing
,”
Cognit. Comput.
9
,
307
314
(
2017
).
43.
M. R.
Salehi
and
L.
Dehyadegari
, “
Optical signal processing using photonic reservoir computing
,”
J. Mod. Opt.
61
,
1442
1451
(
2014
).
44.
A.
Röhm
,
L.
Jaurigue
, and
K.
Lüdge
, “
Reservoir computing using laser networks
,”
IEEE J. Sel. Top. Quantum Electron.
26
,
1
8
(
2019
).
45.
C.
Merkel
,
Q.
Saleh
,
C.
Donahue
, and
D.
Kudithipudi
, “
Memristive reservoir computing architecture for epileptic seizure detection
,”
Procedia Comput. Sci.
41
,
249
254
(
2014
).
46.
E.
Donati
,
M.
Payvand
,
N.
Risi
,
R.
Krause
,
K.
Burelo
,
G.
Indiveri
,
T.
Dalgaty
, and
E.
Vianello
, “Processing EMG signals using reservoir computing on an event-based neuromorphic system,” in 2018 IEEE Biomedical Circuits and Systems Conference (BioCAS) (IEEE, 2018), pp. 1–4.
47.
F.
Köster
,
D.
Ehlert
, and
K.
Lüdge
, “
Limitations of the recall capabilities in delay-based reservoir computing systems
,”
Cognit. Comput.
1
8
(
2020
).
48.
P.
Zurn
,
D. S.
Bassett
, and
N. C.
Rust
, “
The citation diversity statement: A practice of transparency, a way of life
,”
Trends Cogn. Sci.
24
,
669
672
(
2020
).
49.
S. M.
Mitchell
,
S.
Lange
, and
H.
Brus
, “
Gendered citation patterns in international relations journals
,”
Int. Stud. Perspect.
14
,
485
492
(
2013
).
50.
M. L.
Dion
,
J. L.
Sumner
, and
S. M.
Mitchell
, “
Gendered citation patterns across political science and social science methodology fields
,”
Polit. Anal.
26
,
312
327
(
2018
).
51.
N.
Caplar
,
S.
Tacchella
, and
S.
Birrer
, “
Quantitative evaluation of gender bias in astronomical publications from citation counts
,”
Nature Astron.
1
,
0141
(
2017
).
52.
D.
Maliniak
,
R.
Powers
, and
B. F.
Walter
, “
The gender citation gap in international relations
,”
Int. Organ.
67
,
889
922
(
2013
).
53.
J. D.
Dworkin
,
K. A.
Linn
,
E. G.
Teich
,
P.
Zurn
,
R. T.
Shinohara
, and
D. S.
Bassett
, “The extent and drivers of gender imbalance in neuroscience reference lists,” bioRxiv (2020).
54.
M. A.
Bertolero
,
J. D.
Dworkin
,
S. U.
David
,
C. L.
Lloreda
,
P.
Srivastava
,
J.
Stiso
,
D.
Zhou
,
K.
Dzirasa
,
D. A.
Fair
,
A. N.
Kaczkurkin
,
B. J.
Marlin
,
D.
Shohamy
,
L. Q.
Uddin
,
P.
Zurn
, and
D. S.
Bassett
, “Racial and ethnic imbalance in neuroscience reference lists and intersections with gender,” bioRxiv (2020).
55.
X.
Wang
,
J. D.
Dworkin
,
D.
Zhou
,
J.
Stiso
,
E. B.
Falk
,
D. S.
Bassett
,
P.
Zurn
, and
D. M.
Lydon-Staley
, “
Gendered citation practices in the field of communication
,”
Ann. Int. Commun. Assoc.
45
,
134
(
2021
).
56.
P.
Chatterjee
and
R. M.
Werner
, “
Gender disparity in citations in high-impact journal articles
,”
JAMA Netw. Open
4
,
e2114509
(
2021
).
57.
J. M.
Fulvio
,
I.
Akinnola
, and
B. R.
Postle
, “
Gender (im)balance in citation practices in cognitive neuroscience
,”
J. Cogn. Neurosci.
33
,
3
7
(
2021
).
58.
D.
Zhou
,
E. J.
Cornblath
,
J.
Stiso
,
E. G.
Teich
,
J. D.
Dworkin
,
A. S.
Blevins
, and
D. S.
Bassett
, “Gender diversity statement and code notebook v1.0” (2020), https://zenodo.org/record/3672110.
59.
A.
Ambekar
,
C.
Ward
,
J.
Mohammed
,
S.
Male
, and
S.
Skiena
, “Name-ethnicity classification from open sources,” in Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (Association for Computing Machinery, 2009), pp. 49–58.
60.
G.
Sood
and
S.
Laohaprapanon
, “Predicting race and ethnicity from the sequence of characters in a name,” arXiv:1805.02109 (2018).