Recurrent neural networks excel at predicting and generating complex high-dimensional temporal patterns. Due to their inherent nonlinear dynamics and memory, they can learn unbounded temporal dependencies from data. In a machine learning setting, the network’s parameters are adapted during a training phase to match the requirements of a given task/problem increasing its computational capabilities. After the training, the network parameters are kept fixed to exploit the learned computations. The static parameters, therefore, render the network unadaptive to changing conditions, such as an external or internal perturbation. In this paper, we demonstrate how keeping parts of the network adaptive even after the training enhances its functionality and robustness. Here, we utilize the conceptor framework and conceptualize an adaptive control loop analyzing the network’s behavior continuously and adjusting its time-varying internal representation to follow a desired target. We demonstrate how the added adaptivity of the network supports the computational functionality in three distinct tasks: interpolation of temporal patterns, stabilization against partial network degradation, and robustness against input distortion. Our results highlight the potential of adaptive networks in machine learning beyond training, enabling them to not only learn complex patterns but also dynamically adjust to changing environments, ultimately broadening their applicability.

Recurrent neural networks (RNNs) can capture complex temporal dependencies, making them an attractive choice for storing, retrieving, and predicting time series data. This ability renders them attractive models for solving machine learning tasks that require exploiting nonlinear and memory-based processing. During a training phase, the parameters of a RNN are adapted based on a learning algorithm, such as reservoir computing or backpropagation through time. Since the parameters of the network change in that phase, RNNs can be seen as adaptive networks during training. However, after the training, in the so-called inference phase, the parameters of the network are usually kept fixed. While this allows one to exploit the learned dynamical behavior, it often renders the network unable to adapt to new and changing conditions. In our work, we propose a novel mechanism that keeps certain parameters of the network adaptive even in the inference. In essence, conceptors control network dynamics by projecting them into a small subspace of network’s state space that can be interpreted as an ellipsoid. Based on the conceptor framework, we design a control mechanism based on the conceptor framework that regulate the RNN dynamics in an adaptive manner toward a predefined reference dynamic. To this end, we establish an adaptive control of the network dynamics, extending its functionality during inference and facilitating the enhanced abilities of RNNs. In particular, RNNs augmented with the conceptor control loop show improved temporal pattern interpolation ability, stabilization against partial network degradation, and robustness against input distortion.

Adaptive networks feature a flexible structure that evolves over time and might depend on the state of its internal nodes.1 Such adaptive networks find various applications in studying complex dynamical phenomena,2 modeling, e.g., power grids,3 learning of neural networks,4,5 and in the control of complex systems.6 Their adaptive nature allows such networks to dynamically adjust to evolving external conditions and internal dynamics. Such networks are particularly suited for applications requiring robustness against changing environments, with the added capacity to self-organize. Therefore, adaptivity refers to the ability to modify structure and function in response to new information, disturbances, or changing objectives. Such adaptability is achieved through mechanisms that enable the network to reconfigure its connections, adjust its parameters, or even alter its overall architecture in real-time or across different operational phases.

In the context of machine learning, artificial neural networks are applied to solve regression and classification tasks. In particular, recurrent neural networks (RNNs) have proven remarkably successful in predicting complex dynamics due to their inherent ability to capture temporal dependencies. Accordingly, RNNs have been applied to various tasks, such as time series prediction,7 language modeling,8 classification of dynamics,9 and control of complex systems.10 Beyond machine learning, RNNs can be seen as a general abstract model of network dynamics, which gave numerous dynamical insights in the natural science study of biological evolution, social networks, and brains.11 In this context, digital and discrete-time RNNs12,13 are used as well as analog continuous time networks implemented in various hardware can be employed.14–16 Relying on learning and optimization mechanisms, such as backpropagation through time17 and reservoir computing (RC),16,18 certain parts of the network are adapted during the training phase. This adaptation increases the network’s performance in the task at hand. Backpropagation through time employs gradient descent methods to update every parameter of the network. On the other hand, RC simplifies the training by only adapting a set of linear output weights. During RNN training, both methods make the network’s structure evolve over time to increase performance in a certain task, and hence, the network can be interpreted as adaptive. However, after the training, the parameters are usually kept fixed during inference to exploit the learned behavior. This prevents the RNN from continuing to adapt. Recently, various works introduced an extension to a reservoir computing framework that allows one to control the systems during the inference phase.19–21 Independently, these works proved RNNs effective in learning to forecast complex unseen dynamics given several differently parameterized examples from a dynamical system. By parameterizing certain parameters of the network, they demonstrate enhanced capabilities of RNNs allowing one to predict bifurcations, tipping points, and transient chaos. Furthermore, exploiting symmetries of delay-dynamical and spatiotemporal systems, adapted network structures facilitate to infer entire bifurcation diagrams unseen during training.22,23 However, similar to standard RC, the parameters of the RNNs are kept fixed rendering them often unable to cope with changing conditions, such as changing external signals, changing objectives, or degrading aspects of the network.

In this paper, we aim to overcome the unadaptive character of RNNs during the inference time by keeping parts of the inner network weights adaptive. We employ the conceptor framework and design an adaptive control loop that allows the RNNs to deal with unforeseen situations. As it will be later shown, conceptors characterize the input-driven dynamics of the networks. A conceptor can be geometrically interpreted as an ellipsoid encapsulating a trajectory given by the network’s state evolving over time. Furthermore, using the conceptor, the network dynamics can be controlled by projecting it into a certain subspace, thus restricting the potential states.24 To make RNNs adaptive, we design a conceptor-based control loop that estimates the network’s currently occupied subspace. We then use this subspace as a target to regulate the behavior of the RNN in the presence of unforeseen perturbations. By doing so, we control the RNN dynamics in a flexible way and can increase the network’s computational capacity, resulting in enhanced interpolation, robustness against partial network failure, and stabilization of input-driven dynamics.

This paper is structured as follows: In Sec. II, we show how to control RNNs using the conceptor framework. Subsequently, in Sec. II A, we employ the conceptor framework to design a control loop that renders the RNN adaptive even in the inference phase. In Sec. III, we demonstrate the extended functionality of the adaptive RNN for different scenarios, including interpolation and robustness. Finally, we demonstrate an advanced control setting for input distortion in Sec. IV. Section V concludes the results presented in this paper.

In this section, we present a control scheme for the high-dimensional dynamics of RNNs either subject to an external input drive or evolving autonomously. The dynamical behavior of the system here employed, called leaky RNN,18 is governed by the following discrete-time equation:
(1)
where x ( k ) R N is the state of the network, α < 1 is the leak-rate, W R N × N are the recurrent connections, W i n R N × M are the input weights, u ( k ) R M is the input signal, and b R N is a bias vector. The tanh nonlinearity is applied element-wise.

In the following, we drive the leaky RNN with different input sequences u ( k ), and depending on the parameters of the network and the input signal u ( k ), the nodes of the network start to oscillate. These nonlinear oscillations in the state space x ( k ) can be seen as reverberations (similarly to how an object dropped into water causes ripples to spread out and reverberate long after the initial impact25). In Fig. 1, we plot the reverberations of a RNN independently driven by three different sine waves represented in a low-dimensional projection using Principal Component Analysis (PCA). Therefore, PCA on the network’s state space x(k) finds the axes that capture the most of the variance in a decreasing order. In this context, the principal components can be viewed as the axis of an ellipsoid that describes the geometry of the reverberation in the network’s state space. Different characteristics of the input signal are encoded into certain trajectories within the network’s state space that can be distinguished via their encapsulating ellipsoids (see the shadowed regions in the right part of Fig. 1). Analyzing the eigenvalue spectrum of the principal components, therefore, exhibits that most of the variance of the RNN’s dynamics appears in a low-dimensional subspace of the network’s high-dimensional state space.

FIG. 1.

Recurrent neural network subject to three different input time series with different frequencies and amplitudes (color-coded). On the right side, we show the affine projection of the network dynamics related to the three different input patterns onto a low-dimensional manifold using principal component analysis (here using the two largest principal components).

FIG. 1.

Recurrent neural network subject to three different input time series with different frequencies and amplitudes (color-coded). On the right side, we show the affine projection of the network dynamics related to the three different input patterns onto a low-dimensional manifold using principal component analysis (here using the two largest principal components).

Close modal
The so-called conceptor framework introduced by Jaeger24 exploits the fact that the dynamics of driven and autonomous are often lying in a low-dimensional subspace. Additionally, when the dynamics are not low dimensional (the correlation matrix is full rank), filtering out the low-variance directions does not damage the rest of the system. The conceptor C of an RNN is defined via the following objective function considering the state x as a random variable:
(2)
where γ is a control parameter called an aperture and f r o indicates the Frobenius norm. Mathematically, the conceptor is a soft-projection matrix C that minimizes the L2-distance between the projected state C x and the original state x averaged over time. Using a regularization parameter γ, the second term controls how many dimensions within the network’s state space are considered for the projection. The objective function given in Eq. (2) has a unique analytical solution given by
(3)
where R = E x [ x x T ] is the N × N correlation matrix of x and I is the N × N identity matrix.24 
Once computed for a given input sequence, the conceptor C can be used to control the dynamics of the RNN by inserting it into Eq. (1) yielding
(4)
where the symmetric positive definite conceptor matrix C softly projects the dynamics to preserve the relevant dynamical features of x. Accordingly, a conceptor can be seen as a geometrical characterization of a trajectory that arises within an evolving RNN; it captures where the trajectory lies. Furthermore, Eq. (4) shows that this characterization can also be used for controlling the RNN. Whereas the computation of the conceptor can be carried out after computing the network dynamics driven by an input time series, there exists also an online computable version referred to as an autoconceptor.26 Within the autoconceptor framework, the conceptor is recursively updated at every time step k by the following equation:
(5)
where γ is the aperture of the conceptor and η is the learning rate. While Eq. (4) had a static (non-varying in time) conceptor biasing the unfolding of the RNN dynamics, here, we have a dynamic conceptor that evolve in time and converge to the static conceptor. Note that the learning rate needs to be set sufficiently small to average over a sufficient number of time steps to converge to the static conceptor. The convergence is guaranteed by the fact that the autoconceptor dynamic [Eq. (5)] is derived from a stochastic gradient descent procedure on the conceptor loss24 [Eq. (3)].

In general, the conceptor framework can be used to control RNNs and perform a variety of information processing tasks, such as denoising, multi-tasking, and associative memories.24 Furthermore, one can define multiple operations based on the conceptor matrices, such as a set of logical morphing, interpolation operations, and learning rules to learn incrementally a sequence of tasks. The latter was recently used to improve the continual learning abilities of deep feed-forward networks.27 

The conceptor framework enhances the functionality of RNNs, given the network dynamics defined by Eq. (4). However, the conceptor does not yet render the network adaptive since it is defined as a static projection. Such nonadaptive control renders systems often unstable and not robust against external perturbations. Therefore, in the next section, we propose an extension of the conceptor framework, yielding an adaptive control mechanism for RNNs. Subsequently, we compare computational capabilities of the nonadaptive conceptor only and the proposed adaptive conceptor control loop framework in three challenging tasks, namely, temporal pattern interpolation, network degradation, and input perturbation.

In this section, we propose to use the system composed of the conceptor and RNN given by Eq. (4) together with a specifically designed control loop. This conceptor control loop (CCL) aims to balance the control of the dynamics toward predefined target dynamics and the dynamical stability of the network. The CCL is schematically depicted in Fig. 2 encompassing three main steps:

  1. estimating the current conceptor C ( k ) of the RNN in an online fashion from observing the sequence of states x,

  2. pushing the estimated conceptor into the direction of the predefined target conceptor C t a r g e t, and

  3. using the linearly pushed conceptor C a d a p t to control the network.

In our approach, we do not directly enforce the target conceptor as in Eq. (3) but rather push the plugged-in conceptor step-by-step toward the target subspace in an adaptive way.

FIG. 2.

Conceptor control loop where the current conceptor of the network C(k) is measured via the autoconceptor framework and slightly linearly adapted toward the target conceptor C t a r g e t. The adapted conceptor C a d a p t ( n ) is then applied to the network. Therefore, the dynamics of the network are pushed toward a linear subspace that are close to the targeted subspace.

FIG. 2.

Conceptor control loop where the current conceptor of the network C(k) is measured via the autoconceptor framework and slightly linearly adapted toward the target conceptor C t a r g e t. The adapted conceptor C a d a p t ( n ) is then applied to the network. Therefore, the dynamics of the network are pushed toward a linear subspace that are close to the targeted subspace.

Close modal
Based on the autoconceptor framework given by Eq. (5), we derive a control loop that adapts the estimated conceptor toward a target conceptor C t a r g e t according to
(6)
(7)
where x ( n ) is the network state, C a d a p t ( k ) is the adaptive conceptor, γ is the aperture of the conceptor, and β is the gain that determines the strength of the linear push. The target conceptor can be given as a previously defined conceptor or as the mixture of several other conceptors (see Sec. III A) depending on the task that needs to be addressed. Finally, the adaptive conceptor is applied to the network using the following equation:
(8)
In this section, we compare the computational capabilities of RNNs either with the nonadaptive conceptor-only framework or with the applied adaptive CCL on two different tasks. For both tasks, the network is trained to reproduce a given time series/temporal pattern without an external input. First, we perform interpolation of temporal patterns, and in a second application example, we demonstrate the stabilization of the dynamics against partial network degradation. Here, we train the RNN using the RC paradigm to reproduce a temporal pattern u ( k ) by optimizing a linear set of output weights. To this end, we first randomly draw the input weights and the internal weights from normal distributions N ( 0 , 1 ). Afterward, the spectral radius ρ of the internal weights W and the gain scaling ρ i n of the input weights W i n are optimized using Bayesian optimization.28 We report the optimized parameters in Table I. Finally, the weights are optimized to predict the input u ( k ) of the RNN one step ahead by setting the prediction target to be u ( k + 1 ). Since the output weights are linear, we can apply linear or ridge regression to find the optimal parameters W o u t12,16,18 giving the output y = W o u t x. After the training, the RNN is set into the so-called autonomous mode, running solely based on its own predictions. The dynamics of the RNN in the so-called autonomous mode are governed by the following equation:24 
(9)
where the conceptor C can be either static and predefined or adaptive as proposed in the CCL given in Eq. (7). In this mode, there are no external input signals that could guide the network’s dynamics. Assuming that the training is accurate, feeding back the next-step prediction as input allows the system to regenerate the same trajectory as if it was input-driven.
TABLE I.

Hyperparameters used for the different experiments.

Parameter name Symbol Interpolation Network degradation Input distortion
Spectral radius  ρ  1.6  0.749  0.9 
Input scaling  ρin  1.149  0.9 
Bias scaling  ρb  1.5  0.2 
Leakage  α  0.75  0.988  1.0 
Ridge regularization  λ  0.0001  1000  0.01 
Aperture  γ  25  31.6 
Network size  N  256  1500  50 
Learning rate  η  0.2  0.001  0.8 
Control gain  β  2.5 × 10−5  0.7  4 × 10−3 
Random feature conceptor size  NRFC  …  …  200 
Parameter name Symbol Interpolation Network degradation Input distortion
Spectral radius  ρ  1.6  0.749  0.9 
Input scaling  ρin  1.149  0.9 
Bias scaling  ρb  1.5  0.2 
Leakage  α  0.75  0.988  1.0 
Ridge regularization  λ  0.0001  1000  0.01 
Aperture  γ  25  31.6 
Network size  N  256  1500  50 
Learning rate  η  0.2  0.001  0.8 
Control gain  β  2.5 × 10−5  0.7  4 × 10−3 
Random feature conceptor size  NRFC  …  …  200 
In the following, we evaluate the static conceptor and the adaptive CCL in an interpolation task as proposed, e.g., by Wyffels et al.29 The goal of this task is to generate an intermediate temporal pattern that continuously interpolates features, such as the frequency between two time series observed during a training phase. Here, we train an RNN on two sine waves with different frequencies as input data during the training phase. The sine waves are given from the model family as follows:
(10)
where T 0 = 20 and T 1 > T 0. The two sine waves are used to train the network output weights W o u t for one-step-ahead prediction using ridge regression. Additionally, for each of the two time series, a conceptor is computed yielding C 0 and C 1. After training, we run the network in the autonomous mode given by Eq. (9).
We start by investigating the ability of the nonadaptive conceptor-RNN to generate intermediate dynamics that interpolate between two temporal patterns. In his work, Jaeger24 already outlines the ability to transition from one conceptor into another via linear interpolation of the conceptor matrices computed during the training,
(11)
where λ is the interpolation parameter. By scanning the interpolation parameter λ while running the network in the autonomous mode, the network’s dynamics are projected toward the intermediate conceptor space. This projection allows the network to generate a set of new dynamics in a controlled manner. In Fig. 3, we show the results of three different interpolations between sine waves with increasing deviations from their initial periods. In Fig. 3(a), we present the output of the RNN while scanning the interpolation parameter λ in the range of 0–1 slowly, i.e., λ = 10 5 k. The output states indicate that the network can generate, for most of the conditions, an intermediary sine wave pattern yielding intermediate periods. However, with increasing distance of the trained periods T 0 and T 1, the amplitude of the intermediary sine wave decreases. For T 0 = 20 and T 1 = 35, the network converges into a fixed point in the range λ [ 0.35 , 0.85 ]; therefore, no oscillations can be identified. In this range, the control of the RNN fails, and the interpolation breaks down. In Fig. 3(b), we analyze the period of the intermediary solutions, where we find that the period of the intermediary solutions varies along the interpolation parameter as expected and, hence, yields increasing periods while transitioning from λ = 0 toward λ = 1. Accordingly, the static conceptor-only framework might allow for the interpolation of close-by temporal patterns. However, it is not sufficient to generate intermediary dynamics especially if the periods of the two input patterns are far apart due to the occurrence of fixed-point behavior.
FIG. 3.

(a) Autonomously generated output y ( k ) of the conceptor-RNN trained on two sinusoidal time series with T 0 = 20 and different T 1 [ 25 , 30 , 35 ] (different blue shades). Along the x axis, the plugged-in conceptor is a linear interpolation between the conceptors C 0 and C 1 defined during training. (b) Period of the intermediary solutions while scanning the interpolation parameter λ in the range of 0–1. (c) Output of the conceptor-RNN with an applied conceptor control loop during interpolation between the two conceptors C 0 and C 1 generated for different periods of the training time series given by two sinusoidal time series with T 0 = 20 and different T 1 [ 25 , 30 , 35 ] (different red shades). Similar to (a) and (b), along the x axis, the interpolation parameter λ is scanned in the range of 0–1. (d) Varying period of the autonomously generated time series y ( k ) during the interpolation while applying the adaptive conceptor control loop. The interpolation parameter λ at the top x axis of the plots indicates the sweep from λ = 0 toward λ = 1 in 150 000 time steps.

FIG. 3.

(a) Autonomously generated output y ( k ) of the conceptor-RNN trained on two sinusoidal time series with T 0 = 20 and different T 1 [ 25 , 30 , 35 ] (different blue shades). Along the x axis, the plugged-in conceptor is a linear interpolation between the conceptors C 0 and C 1 defined during training. (b) Period of the intermediary solutions while scanning the interpolation parameter λ in the range of 0–1. (c) Output of the conceptor-RNN with an applied conceptor control loop during interpolation between the two conceptors C 0 and C 1 generated for different periods of the training time series given by two sinusoidal time series with T 0 = 20 and different T 1 [ 25 , 30 , 35 ] (different red shades). Similar to (a) and (b), along the x axis, the interpolation parameter λ is scanned in the range of 0–1. (d) Varying period of the autonomously generated time series y ( k ) during the interpolation while applying the adaptive conceptor control loop. The interpolation parameter λ at the top x axis of the plots indicates the sweep from λ = 0 toward λ = 1 in 150 000 time steps.

Close modal

The reasons for the breakdown of interpolation when using the static conceptor framework might be twofold. First, the interpolated conceptor enforces dynamics in a linear subspace of the network’s state space that was not observed during training. Hence, the dynamics within that subspace cannot directly be optimized to be stable. Furthermore, the output weights are trained only on the two initial time series, whereas we apply them to intermediate dynamics that can be far apart from the training dynamics. Second, the linear interpolation of the conceptor matrix introduced in Eq. (11) is known to shrink the eigenvalues of the resulting interpolated conceptor.30 This shrinkage of the eigenvalues, known for symmetric positive definite matrices, might lead to a loss of information and, in turn, might be one reason for the decreasing amplitudes of the intermediary solutions. As discussed by Feragen and Fuster,30 the shrinkage of the eigenvalues can be avoided by using another geodesic metric to derive the interpolation scheme, e.g., log-Euclidean, affine invariant, or the shape and orientation rotation metric. However, these metrics, due to their reliance on matrix exponentials, are computationally expensive and often numerically unstable.30 

We now proceed to test the ability of the adaptive CCL for interpolation between the two temporal patterns as carried out above. Therefore, we use the same network parameters (see Table I) as used for Figs. 3(a) and 3(b) and train the network on the same set of sine waves as above. After training, we apply the CCL to the network and scan the interpolation parameter λ in the range of 0–1. In Figs. 3(c) and 3(d), we show the output of the network along this scan. We obtain that the network can generate an intermediary sine wave pattern for the three sine wave patterns T 1 [ 25 , 30 , 35 ]. In contrast to the conceptor-only framework, the adaptive CCL achieves a much more stable oscillation amplitude throughout all three interpolations. Additionally, the periods of the interpolated temporal pattern exhibit a smooth and constantly increasing transition from the starting frequency T 0 toward T 1 as shown in Fig. 3(d). Accordingly, the adaptive character of the CCL results in an interpolation much more stable and consistent interpolation than the static scheme.

Moreover, additional experiments show that the CCL can further boost the ability of the RNN to even interpolate between temporal patterns of more distant periods (Fig. 4). Whereas the conceptor-only framework breaks down to interpolate time series with periods T 0 = 20 and T 1 = 35, here, we demonstrate continuous interpolation of the CCL of periods from T 0 = 20 up to more than twice the initial period at T 1 = 47.5 avoiding fixed-point solutions in between. Accordingly, we argue that the proposed adaptive CCL improves the interpolation abilities of the RNN beyond the static conceptor-only framework by enhancing the generalization ability of the network without further training. The interpolation capabilities of the CCL eventually break down for a longer period of T 1 = 50. In this case, an additional input example in the training set with a period in between T 0 and T 1 would extend the interpolation capabilities. We note that the CCL makes a better use of the input examples than the conceptor-only framework.

FIG. 4.

Period of the output patterns generated by an RNN with an applied conceptor control loop to interpolate sine wave time series with different input periods T 0 = 20 and T 1 [ 37.5 , 50 ].

FIG. 4.

Period of the output patterns generated by an RNN with an applied conceptor control loop to interpolate sine wave time series with different input periods T 0 = 20 and T 1 [ 37.5 , 50 ].

Close modal

From a machine learning perspective where the objective is to generalize from a few samples, the CCL can be seen as a way to adaptively enforce a prior at inference time on how to generalize. The prior consists of assuming that intermediary samples are generated by dynamics with intermediate ellipsoids specified by the linear interpolation between the conceptor. With the CCL, we find that the interpolated dynamics generating the prediction lies in intermediary ellipsoids and facilitate a strong similarity with the ellipsoids elicited by the training data. In the case of the static conceptor where the interpolation leads to a fixed-point dynamic, the ellipsoid of the dynamic is reduce to a point. In this case, the prior is violated.

In this section, we evaluate the capabilities of the static conceptor only and the adaptive CCL to enhance the robustness of networks to partial degradation. Partial degradation refers to the process of removing neurons from the network in the inference while evaluating how well the network is able to continue performing the task it learned during the training. Historically, psychologists and neuroscientists have recognized that artificial and biological neural networks share a characteristic known as “graceful degradation.”31 This term describes how a system’s performance declines gradually as more neural units are compromised, without a singular, catastrophic failure point. Inspired by early interest in this phenomenon, numerous engineering strategies have been developed to leverage and augment the inherent fault tolerance in ANNs.32 Most of these strategies lack the ability to adapt dynamically. Examples of such strategies include the injection of noise into a neuron activity during training, the addition of regularization terms to improve robustness, or the design of networks with a built-in redundancy by duplicating important neurons and/or synapses. However, research on improving graceful degradation within RNNs, especially those handling dynamic inputs, remains sparse.32 

To study the influence of degradation within RNNs, here, we train the output weights of a network in one-step-ahead prediction. We then feed the prediction back to generate a pattern autonomously as defined by Eq. (9). As the target time series, here, we use human motion capture data from the MoCap data set.33 The corresponding time series provides the data of 94 sensors attached to a human body sampling the movement of joints and limbs in a three-dimensional space. For the training, we use 161-time steps that include two periods of a running behavior as shown in Fig. 5. After optimizing34 the weights and parameters of the RNN for the autonomous continuation (see Table I), we start to degrade the network by removing R randomly selected neurons from the network. More specifically, we clamp the corresponding state-coordinate x of the selected neuron to zero while computing the network’s autonomous continuation. Finally, we analyze the ability of the network to continue its autonomous prediction despite the partial degradation by comparing the mean square error (MSE) of its output compared to the original time series. In our implementation, we use an RNN of N = 1500 neurons and first evaluate the robustness of the static conceptor-only framework by degrading R = 200 randomly selected neurons. In Fig. 5(a), we plot the three first Principal Components (PCs) of the autonomously evolving RNN in red. The projection of the evolution of the RNN without degradation is shown for comparison (green color). Similarly to the task of interpolation as described in Sec. III A, the applied degradation pushes the system outside the dynamical regime it was trained on. In analogy to the interpolation results, we obtain that the RNN with the applied static conceptor tends to fall into fixed-point dynamics (red curve) rendering the network unable to continue its learned dynamics. In this case, the stickman, whose movements mirror the time series data, becomes unnaturally still, as illustrated in Fig. 5(b). With the degradation, the network’s output does not produce a running motion, leaving the stickman frozen.

FIG. 5.

(a) Projection of the output of the RNN trained to predict a 94-dimensional time series motion capture time series of a human running behavior. The RNN is degraded by removing a random subset of 200 of its 1500 neurons. The trajectory is depicted for the RNN with the conceptor control loop (blue), compared with only the static conceptor (red), and with a baseline where the RNN is not degraded (green) along the three first principal components of the output. (b) The same output generation (the same color code) is represented with the geometry of the human body angles. The number of steps depicted is the same for all the conditions and is chosen to showcase one full cycle of the baseline (green).

FIG. 5.

(a) Projection of the output of the RNN trained to predict a 94-dimensional time series motion capture time series of a human running behavior. The RNN is degraded by removing a random subset of 200 of its 1500 neurons. The trajectory is depicted for the RNN with the conceptor control loop (blue), compared with only the static conceptor (red), and with a baseline where the RNN is not degraded (green) along the three first principal components of the output. (b) The same output generation (the same color code) is represented with the geometry of the human body angles. The number of steps depicted is the same for all the conditions and is chosen to showcase one full cycle of the baseline (green).

Close modal

We now proceed to evaluate the graceful degradation abilities of the RNN while applying the CCL that we slightly adapted for this challenging task (see  Appendix A). As shown in Fig. 5, adding the CCL stabilizes the system around a limit cycle after a short transient (blue curve). Across different experiments with a growing number K of degraded neurons, we observe that the CCL consistently stabilizes the system against partial degradation. This adaptation enables the system to maintain its dynamics and qualitatively preserve the limit cycle [Fig. 6(a)]. From visual inspection, we can also see that the CCL is able to preserve appealing properties of running as shown in Fig. 5(b). Note that the recovery from the degradation is only approximate, and the stickman from the CCL system is slightly slowed down compared to the baseline (green), which achieves one full cycle in a shorter time. Additional examples of the reconstructed dynamics with and without a control loop are shown in  Appendix B.

FIG. 6.

(a) Qualitative evaluation of the RNN’s ability to preserve its memorized periodic cycle after the removal of up to K of its 1500 neurons. For each increment in K, 30 distinct trials were conducted with randomly selected neurons removed. The qualitative failure rate is determined via a threshold of 1.0 on the variance of the principal component of the RNN. The error bars indicate the standard deviation. (b) Quantitative assessment of the performance by computing the mean Normalized Root Mean Squared Error (NRMSE) in comparison with the baseline (non-degraded RNN output) after phase alignment over the trials that are non-failing for both conditions (CCL and C). The error bars indicate the standard deviation on the non-failing trials.

FIG. 6.

(a) Qualitative evaluation of the RNN’s ability to preserve its memorized periodic cycle after the removal of up to K of its 1500 neurons. For each increment in K, 30 distinct trials were conducted with randomly selected neurons removed. The qualitative failure rate is determined via a threshold of 1.0 on the variance of the principal component of the RNN. The error bars indicate the standard deviation. (b) Quantitative assessment of the performance by computing the mean Normalized Root Mean Squared Error (NRMSE) in comparison with the baseline (non-degraded RNN output) after phase alignment over the trials that are non-failing for both conditions (CCL and C). The error bars indicate the standard deviation on the non-failing trials.

Close modal

In our quantitative evaluation, we assessed the continuation capabilities of the non-failing trials by comparing the time series from the baseline (non-degraded RNN) with the phase-aligned output from the degraded RNN using a Normalized Root Mean Squared Error (NRMSE). The results show that the average NRMSE is consistently lower for the system incorporating the CCL [Fig. 6(b)]. Overall, both quantitative and qualitative results highlight that the adaptivity endowed by the CCL improves the robustness of the RNN against network degradation. Here, we show that the CCL is a straightforward, yet effective enhancement allowing for graceful degradation of internal neurons.

In Sec. III, we set up RNNs to autonomously continue certain temporal patterns and evaluated their performance in two tasks. To investigate the robustness and adaptability of our approach, we evaluate the capacity of an input-driven RNN to continue to perform its task when the input is subject to distortions. We aim to achieve that the network adapts such that it equalizes an input signal that is reduced in a signal amplitude compared to the training period. Despite that the chosen distortion is a linear transformation of the input signal, the RNN response to this change is nonlinear. In signal processing terms, we are looking for a mechanism of adaptive blind nonlinear channel equalization.35 Our work aims to embed further adaptability properties within RNNs and, thus, get closer to the robustness found in biological neural networks.

Adding adaptation mechanisms to input-driven systems that are processing information is a challenging task. Changes to the input can push the system away from performing the desired task. If the distortion is sufficiently large, it might act as a signal from an unseen dynamical system coupled to the RNN. Our task is similar to the challenge of regulating coupled autonomous systems for which only a fraction of the entire system is accessible for observation and control. To address the adaptive control of input-driven RNNs, we extend the Conceptor Control Loop (CCL) in two ways: stacking multiple RNNs and their associated CCLs into hierarchical architecture and using a more computationally efficient version of the conceptor called the random feature conceptor.

1. Hierarchical architecture

We introduce hierarchical architecture to handle distorted inputs. A single CCL cannot manage this task as an iterative process is required, which cannot be done in a single RNN that is continuously driven by the input. The aim of the hierarchical architecture is that, by progressing through the hierarchy, each layer will further correct the signal distortion. This architecture consists of multiple identical layers of RNNs with their associated CCLs, as illustrated in Fig. 7. Each layer contains identical RNNs with the same parameters. The readout W o u t is trained, as before, via linear regression for a next-step prediction. The conceptor is measured on the training (undistorted) time series. In a hierarchical structure of L layers, the lowest layer is connected to the input, while the topmost layer provides the global system’s output.

FIG. 7.

Hierarchical architecture of two layers of RNN. Each layer is identical (the same RNN parameters and the same conceptor reference) and is composed of a RNN with its CCL. The CCL is implemented via random feature conceptors, a vector-based analog of a conceptor that is more computationally efficient. To keep the expressivity of matrix-based conceptors, the RNN state x is expanded in higher dimensions z, where the vector-based conceptor is element-wise multiplicated with the expanded state. The distorted input is injected at the bottom of the hierarchy, and each layer progressively brings the distorted signal (gray) toward the original one (light red). For illustration, we show a two layer hierarchy while we use four layers in the main text.

FIG. 7.

Hierarchical architecture of two layers of RNN. Each layer is identical (the same RNN parameters and the same conceptor reference) and is composed of a RNN with its CCL. The CCL is implemented via random feature conceptors, a vector-based analog of a conceptor that is more computationally efficient. To keep the expressivity of matrix-based conceptors, the RNN state x is expanded in higher dimensions z, where the vector-based conceptor is element-wise multiplicated with the expanded state. The distorted input is injected at the bottom of the hierarchy, and each layer progressively brings the distorted signal (gray) toward the original one (light red). For illustration, we show a two layer hierarchy while we use four layers in the main text.

Close modal

2. Random feature conceptors

In our hierarchical structure, we use random feature conceptors,24 a modified version of conceptors that randomly and linearly expands the state of the RNN x ( k ) to higher dimensions z ( k ) (random features), where a vector-based conceptor c a d a p t ( k ) is applied via an element-wise multiplication instead of a matrix multiplication. This results in the following equations:
(12)
(13)
where F is a random expansion in the z-space of higher dimension N R F C and G corresponds to a matrix product between a random compression matrix to come back to the original dimension of x and the weight matrix W of the RNN.

Random feature conceptors are approximations of matrix-based conceptors while having various computational improvements.24 For our application, the main advantage is that the CCL converges much faster. We provide more details about random feature conceptors in  Appendix C.

In our adaptation task toward input signal distortions, the system learns the prediction target from a single presentation of the undistorted input time series. In the inference phase, the system should counteract a distortion applied to the input series not present during the training. The underlying idea is that the CCL of each layer pushes the dynamics toward its reference linear subspace via the unperturbed signal’s reference conceptor. This push makes each layer transform its input into an output closer to the unperturbed signal.

To demonstrate the validity of the suggested hierarchical architecture, we studied a relatively simple four-layer RNN subjected to a 0.3 downscaling distortion on a periodic time series composed of two sine waves with periods of 7 and 21. As shown in Fig. 8, the hierarchically arranged system progressively eliminates the distortion, reducing the prediction error layer by layer. The reconstruction loss, characterized by the NRMSE, shows a consistent decrease across the hierarchy as presented in Fig. 8(a). We observe a clear improvement of the accuracy in the time series reconstruction by the third layer in Fig. 8(d), compared to the input-connected initial layer in Fig. 8(b).

FIG. 8.

Robustness to strong scaling distortion. (a) NRMSE of the output for the different layers in the hierarchy of the architecture. (b) and (d) Time series output of the first and last layer of the hierarchy. (c) Evolution of the CCL in time. The CCL is initialized with β = 0 and C a d a p t = I. At time k = 1000, the system converged to a state corresponding to no feedback in (a). At this moment, the feedback is turned on β = 0.7, and the C a d a p t of each layer is pushed further toward their target C t a r g e t. Here, C a d a p t ( 0 ) is initialized at the identity matrix.

FIG. 8.

Robustness to strong scaling distortion. (a) NRMSE of the output for the different layers in the hierarchy of the architecture. (b) and (d) Time series output of the first and last layer of the hierarchy. (c) Evolution of the CCL in time. The CCL is initialized with β = 0 and C a d a p t = I. At time k = 1000, the system converged to a state corresponding to no feedback in (a). At this moment, the feedback is turned on β = 0.7, and the C a d a p t of each layer is pushed further toward their target C t a r g e t. Here, C a d a p t ( 0 ) is initialized at the identity matrix.

Close modal

We also verify the CCL’s causal role in the signal recovery by studying the action of the hierarchical system without the CCL. The inability of the hierarchical system without the CCL to correct distortions is illustrated by the orange curve in Fig. 8(a). This result further underlines the CCL’s computational capabilities in controlling the hierarchical architectures and handling novel data during inference.

To analyze in more detail the functioning of the CCL, we monitor in time the CCL dynamics as depicted in Fig. 8(c). In the first part for time k < 1000, we observe the CCL dynamics without feedback [ β = 0 in Eq. (C6)]. At time k = 1000, the CCL converged to a state with a large distance to C t a r g e t, as expected for a distorted input that significantly differs from the one used during training. Accordingly, the reconstruction of the input has a large error, corresponding to the orange curve in Fig. 8(a). After time k = 1000 (dotted vertical line), the feedback is activated with β = 0.7 in Eq. (C6). The conceptor C a d a p t of each layer is then further pushed toward the target conceptor. The progressive reduction in reconstruction error, corresponding to the blue curve in Fig. 8(a), demonstrates the fact that the higher layers get their conceptor C a d a p t closer to the target conceptor. Hence, the hierarchy uniquely allows the progressive compensation of the input attenuation.

Interestingly, the random feature conceptor reduced the time of convergence. In Fig. 8(c), the CCL converges in less than 500 steps, while the normal conceptor CCL typically requires more than 10 000 steps to converge. This is because random feature conceptors can be used with a larger learning rate, see Table I for a comparison of the parameters used for each task. More details on the properties of the random feature conceptor can be found in Jaeger.24 

In this paper, we introduced an adaptive control loop based on the conceptor framework to augment the learning capabilities of RNNs. We demonstrated that by controlling RNN dynamics during inference, we achieve several significant benefits. First, our method improves the generalization abilities of RNNs, enabling the interpolation of more distinct temporal patterns. Second, it enhances graceful degradation, making RNNs more robust to errors or unexpected conditions. Finally, hierarchical RNN structures with an adaptive control loop exhibit improved signal reconstruction capabilities.

Our findings suggest that maintaining adaptive elements within a network post-training can significantly expand the potential applications of RNNs. This approach has particular promise for enabling RNNs to operate reliably in challenging environments or adapt to novel tasks with minimal additional training (few-shot learning).

The increased robustness demonstrated by our adaptive RNNs has intriguing parallels across multiple disciplines. In cognitive science, studies of subjects using inverting goggles have revealed remarkable human adaptability to significant perceptual shifts, suggesting specialized mechanisms for managing perturbations in dynamical visuomotor coordination.36 Similarly, in cybernetics and signal processing, techniques for recovering intentionally hidden or distorted signals are required, e.g., in secure communications.37 These connections underscore the broader relevance of understanding and enhancing adaptive capabilities in complex systems.

While our study focused on recurrent networks, the conceptor framework has the potential to control and augment various dynamical systems, including feedforward and other artificial neural networks. Therefore, our work represents a starting point for broader exploration into the control and enhancement of nonlinear computing systems.

See the supplementary material for this study that includes videos that demonstrate the effects of the Conceptor Control Loop (CCL) on RNN behavior. Available videos are “running.gif” for the CMU 016 55 MoCap data series, “runningCcl7.gif” showing the seventh example from Fig. 11(c) with CCL, and “runningNoCcl7.gif” without CCL from Fig. 11(b). These visuals highlight the RNN’s adaptive performance under various conditions, supporting our findings on the benefits of incorporating adaptivity into RNNs.

The authors acknowledge the support of the Spanish State Research Agency, through the Severo Ochoa and María de Maeztu Program for Centers and Units of Excellence in R&D (No. CEX2021-001164-M), the INFOLANET projects (No. PID2022-139409NB-I00) funded by the MCIN/AEI/10.13039/501100011033, and through the QUARESC project (Nos. PID2019-109094GB-C21 and -C22/AEI/10.13039/501100011033) and the DECAPH project (Nos. PID2019-111537GB-C21 and -C22/AEI/10.13039/501100011033). All authors acknowledge financial support by the European Union’s Horizon 2020 research and innovation program under the Marie Skłodowska-Curie Grant Agreement No. 860360 (POST DIGITAL).

The authors have no conflicts to disclose.

 G.P. and M.G. contributed equally to this paper.

Guillaume Pourcel: Conceptualization (equal); Formal analysis (equal); Investigation (equal); Methodology (equal); Project administration (equal); Resources (equal); Software (equal); Visualization (equal); Writing – original draft (equal); Writing – review & editing (equal). Mirko Goldmann: Conceptualization (equal); Formal analysis (equal); Investigation (equal); Methodology (equal); Project administration (equal); Resources (equal); Software (equal); Visualization (equal); Writing – original draft (equal); Writing – review & editing (equal). Ingo Fischer: Funding acquisition (equal); Supervision (equal); Writing – review & editing (equal). Miguel C. Soriano: Funding acquisition (equal); Supervision (equal); Writing – review & editing (equal).

The data that support the findings of this study are available from the corresponding author upon reasonable request.

In our simulation, we found that slightly adapting the CCL by mixing the computations of the estimated conceptor C and C a d a p t improved the performance of the system. Hence, the simulation for Sec. III B was done with the following CCL:
(A1)
where now, there is a single conceptor that both estimates the linear subspace of the current dynamic and integrates the feedback from the target.

More precisely, we found that the CCL in the main text [Eq. (8)] that keeps C and C a d a p t separated (S-CCL) is less robust than the mixed CCL (M-CCL) presented above. When using the same gain β = 0.7, the S-CCL does not yield any improvement with respect to the baseline. Only after reducing the gain to β = 0.3 were we able to match the performance of the M-CCL (Fig. 9). Moreover, the S-CCL is less robust to the initial conditions. When initializing the CCL with the identity matrix (as opposed to C t a r g e t in the main text), the S-CCL performance degrades to the level of the baseline (Fig. 10).

FIG. 9.

Same caption as Fig. 6 except that the S-CCL is added for comparison. After reducing the gain from β = 0.7 to β = 0.3, the S-CCL performs similarly to the M-CCL.

FIG. 9.

Same caption as Fig. 6 except that the S-CCL is added for comparison. After reducing the gain from β = 0.7 to β = 0.3, the S-CCL performs similarly to the M-CCL.

Close modal
FIG. 10.

Same caption as Fig. 9 except that the state of the CCLs is initialized to the identity.

FIG. 10.

Same caption as Fig. 9 except that the state of the CCLs is initialized to the identity.

Close modal
FIG. 11.

(a) The first six principal components of the RNN without degradation. The same figure but for different samples of random degradation of 200 of the 1500 neurons with (c) and without (b) the CCL.

FIG. 11.

(a) The first six principal components of the RNN without degradation. The same figure but for different samples of random degradation of 200 of the 1500 neurons with (c) and without (b) the CCL.

Close modal
We also provide an additional analysis of the two CLL mechanisms by looking at their long-term behavior. The autoconceptor dynamics for the S-CCL [Eq. (5)] is designed to minimize the conceptor objective function L = E x C x x 2 + γ 2 C fro 2. More precisely, it is derived as a stochastic gradient descent on the conceptor objective function,
(A2)
At convergence, we have that the conceptor C ( k ) is at a minimum of the conceptor objective function and, hence, captures the subspace of the current dynamic that is then used to control by comparing it to the C t a r g e t. In contrast, the M-CCL mixes the two operations of estimating the current subspace of the dynamic and the control. It is notable that the C a d a p t dynamics can then be understood as a stochastic gradient descent on an augmented objective function,
(A3)
where L f b = E x C a d a p t x x + γ 2 C a d a p t fro 2 + β C a d a p t C t a r g e t fro 2. The comparison of the objective functions above provides an initial understanding of the differences between the two methods. However, it does not yet offer a theoretical basis for determining which method is most suitable. Furthermore, the analysis overlooks the fact that the CCL dynamics influence the state x of the RNN. The coupling between the conceptor dynamics and the RNN dynamics is notoriously difficult to analyze and remains poorly understood. This has only been achieved in simpler settings.24 

In Figs. 11(b) and 11(c), we present further results of the individual trials with different random perturbations of Fig. 5(b) and for reference the case where the RNN works without degradation [Fig. 11(a)]. We also provide videos in the supplementary material of the running time series CMU 016 55 of the MoCap data (running.gif), and the seventh example of Fig. 11 with (runningCcl7.gif) and without (runningNoCcl7.gif) the CCL. The time series were pre-processed as described by Jaeger.26 

In Sec. IV, we adapted the RNN and CCL based on the ideas of a Random Feature Conceptor (RFC)24 to enhance the system’s computational efficiency. This was achieved by adding an expansion in high dimension z where the matrix-based computation of conceptors can be replaced by a vector-based analog. Other than for computational efficiency on a digital computer, this method is more bio-plausible and can be more easily be implemented in neuromorphic hardware.24 The equation of the RNN, which we repeat here for clarity, is then
(C1)
(C2)
where F is a random expansion in the z-space of higher dimension N R F C and G corresponds to a matrix product between a random compression matrix to come back to the original dimension of x and the weight matrix W of the RNN. Now, the conceptor is a vector of dimension N R F C, which is introduced in the RNN dynamic through an elementwise multiplication. Theoretical arguments and experimental evidence show that this vector operation is very close to applying a full matrix conceptor on the non-expanded space.24 Hence, RFC can still be interpreted as a computational trick with the same semantic of projecting the dynamic of x. The CCL is now applicable in a vector form,
(C3)
(C4)
where the autoconceptor dynamics correspond to a stochastic gradient descent on a vector-based conceptor objective function analog of Eq. (2). For each element of c i of c,
(C5)
Similarly to Sec. III B, we found that mixing the c and c a d a p t improves performance; therefore, we used the following update for our simulations:
(C6)
We finally note that the autoconceptor dynamics are favorable. Notably, it exhibits a fast convergence, and it can be used with a large learning rate24 (see Table I).
1.
R.
Berner
,
T.
Gross
,
C.
Kuehn
,
J.
Kurths
, and
S.
Yanchuk
, “
Adaptive dynamical networks
,”
Phys. Rep.
1031
,
1
59
(
2023
).
2.
T.
Gross
,
C. J. D.
D’Lima
, and
B.
Blasius
, “
Epidemic dynamics on an adaptive network
,”
Phys. Rev. Lett.
96
,
208701
(
2006
).
3.
R.
Berner
,
S.
Yanchuk
, and
E.
Schöll
, “
What adaptive neuronal networks teach us about power grids
,”
Phys. Rev. E
103
,
042315
(
2021
).
4.
C.
Clopath
,
L.
Büsing
,
E.
Vasilaki
, and
W.
Gerstner
, “
Connectivity reflects coding: A model of voltage-based STDP with homeostasis
,”
Nat. Neurosci.
13
,
344
352
(
2010
).
5.
G. B.
Morales
,
C. R.
Mirasso
, and
M. C.
Soriano
, “
Unveiling the role of plasticity rules in reservoir computing
,”
Neurocomputing
461
,
705
715
(
2021
).
6.
Q.
Xuan
,
F.
Du
,
H.
Dong
,
L.
Yu
, and
G.
Chen
, “
Structural control of reaction-diffusion networks
,”
Phys. Rev. E
84
,
036101
(
2011
).
7.
H.
Hewamalage
,
C.
Bergmeir
, and
K.
Bandara
, “
Recurrent neural networks for time series forecasting: Current status and future directions
,”
Int. J. Forecast.
37
,
388
427
(
2021
).
8.
I.
Sutskever
,
J.
Martens
, and
G. E.
Hinton
, “Generating text with recurrent neural networks,” in Proceedings of the 28th International Conference on Machine Learning (ICML-11) (Omnipress, 2011), pp. 1017–1024.
9.
M.
Ganaie
,
S.
Ghosh
,
N.
Mendola
,
M.
Tanveer
, and
S.
Jalan
, “
Identification of chimera using machine learning
,”
Chaos
30
,
063128
(
2020
).
10.
Y.
Pan
and
J.
Wang
, “
Model predictive control of unknown nonlinear dynamical systems based on recurrent neural networks
,”
IEEE Trans. Ind. Electron.
59
,
3089
3101
(
2011
).
11.
C.
Hens
,
U.
Harush
,
S.
Haber
,
R.
Cohen
, and
B.
Barzel
, “
Spatiotemporal signal propagation in complex networks
,”
Nat. Phys.
15
,
403
412
(
2019
).
12.
H.
Jaeger
and
H.
Haas
, “
Harnessing nonlinearity: Predicting chaotic systems and saving energy in wireless communication
,”
Science
304
,
78
80
(
2004
).
13.
M.
Lukoševičius
and
H.
Jaeger
, “
Reservoir computing approaches to recurrent neural network training
,”
Comput. Sci. Rev.
3
,
127
149
(
2009
).
14.
L.
Appeltant
,
M. C.
Soriano
,
G.
Van Der Sande
,
J.
Danckaert
,
S.
Massar
,
J.
Dambre
,
B.
Schrauwen
,
C. R.
Mirasso
, and
I.
Fischer
, “
Information processing using a single dynamical node as complex system
,”
Nat. Commun.
2
,
468
(
2011
).
15.
D.
Brunner
,
M. C.
Soriano
,
C. R.
Mirasso
, and
I.
Fischer
, “
Parallel photonic information processing at gigabyte per second data rates using transient states
,”
Nat. Commun.
4
,
1364
(
2013
).
16.
Reservoir Computing: Theory, Physical Implementations, and Applications, Natural Computing Series, edited by K. Nakajima and I. Fischer (Springer, Singapore, 2021).
17.
P. J.
Werbos
, “Backpropagation through time: What it does and how to do it,”
Proc. IEEE
78
(10), 1550–1560 (1990).
18.
H.
Jaeger
, “The ‘echo state’ approach to analysing and training recurrent neural networks—With an erratum note,” German National Research Center for Information Technology (GMD), GMD Technical Report No. 148, 2010, pp. 1–47.
19.
C.
Klos
,
Y. F. K.
Kossio
,
S.
Goedeke
,
A.
Gilra
, and
R.-M.
Memmesheimer
, “
Dynamical learning of dynamics
,”
Phys. Rev. Lett.
125
,
088103
(
2020
).
20.
L.-W.
Kong
,
H.-W.
Fan
,
C.
Grebogi
, and
Y.-C.
Lai
, “
Machine learning prediction of critical transition and system collapse
,”
Phys. Rev. Res.
3
,
13090
(
2021
).
21.
J. Z.
Kim
,
Z.
Lu
,
E.
Nozari
,
G. J.
Pappas
, and
D. S.
Bassett
, “
Teaching recurrent neural networks to infer global temporal structure from local examples
,”
Nat. Mach. Intell.
3
,
316
323
(
2021
).
22.
M.
Goldmann
,
C. R.
Mirasso
,
I.
Fischer
, and
M. C.
Soriano
, “
Learn one size to infer all: Exploiting translational symmetries in delay-dynamical and spatiotemporal systems using scalable neural networks
,”
Phys. Rev. E
106
,
044211
(
2022
).
23.
X. A.
Ji
and
G.
Orosz
, “
Learn from one and predict all: Single trajectory learning for time delay systems
,”
Nonlinear Dyn.
112
,
3505
3518
(
2024
).
24.
H.
Jaeger
, “Controlling recurrent neural networks by conceptors,” arXiv:1403.3369 (2014).
25.
C.
Fernando
and
S.
Sojakka
, “Pattern recognition in a bucket,” in Advances in Artificial Life, edited by W. Banzhaf, J. Ziegler, T. Christaller, P. Dittrich, and J. T. Kim (Springer, Berlin, 2003), pp. 588–597.
26.
H.
Jaeger
, “
Using conceptors to manage neural long-term memories for temporal patterns
,”
J. Mach. Learn. Res.
18
,
1
43
(
2017
).
27.
X.
He
and
H.
Jaeger
, “Overcoming catastrophic interference using conceptor-aided backpropagation,” in International Conference on Learning Representations (Proceedings of Machine Learning Research, 2018).
28.
J.
Yperman
and
T.
Becker
, “Bayesian optimization of hyper-parameters in reservoir computing,” arXiv:1611.05193 (2016).
29.
F.
Wyffels
,
J.
Li
,
T.
Waegeman
,
B.
Schrauwen
, and
H.
Jaeger
, “
Frequency modulation of large oscillatory neural networks
,”
Biol. Cybern.
108
,
145
157
(
2014
).
30.
A.
Feragen
and
A.
Fuster
, “Geometries and interpolations for symmetric positive definite matrices,” in Modeling, Analysis, and Visualization of Anisotropy, edited by T. Schultz, E. Özarslan, and I. Hotz (Springer International Publishing, Cham, 2017), pp. 85–113.
31.
J. L.
McClelland
,
D. E.
Rumelhart
, and
P. R.
Group
,
Parallel Distributed Processing, Volume 2: Explorations in the Microstructure of Cognition: Psychological and Biological Models
(
The MIT Press
,
1986
).
32.
C.
Torres-Huitzil
and
B.
Girau
, “
Fault and error tolerance in neural networks: A review
,”
IEEE Access
5
,
17322
17341
(
2017
).
33.
Carnegie Mellon University Graphics Lab, “Motion capture database”; see http://mocap.cs.cmu.edu/; accessed 3 April 2023.
34.
The optimization is done via linear regression on one-step-ahead prediction while being driven by the input. After optimization, when the system is driven by the input, the prediction error has an NRMSE of 0.042.
35.
B.
Farhang-Boroujeny
,
Adaptive Filters: Theory and Applications
, 2nd ed. (
Wiley
,
Chichester, West Sussex, UK
,
2013
).
36.
I.
Kohler
, “
The formation and transformation of the perceptual world
,”
Psychol. Issues
3
,
1
173
(
1963
).
37.
I.
Fischer
,
Y.
Liu
, and
P.
Davis
, “
Synchronization of chaotic semiconductor laser dynamics on subnanosecond time scales and its potential for chaos communication
,”
Phys. Rev. A
62
,
011801
(
2000
).