Recurrent neural networks excel at predicting and generating complex high-dimensional temporal patterns. Due to their inherent nonlinear dynamics and memory, they can learn unbounded temporal dependencies from data. In a machine learning setting, the network’s parameters are adapted during a training phase to match the requirements of a given task/problem increasing its computational capabilities. After the training, the network parameters are kept fixed to exploit the learned computations. The static parameters, therefore, render the network unadaptive to changing conditions, such as an external or internal perturbation. In this paper, we demonstrate how keeping parts of the network adaptive even after the training enhances its functionality and robustness. Here, we utilize the conceptor framework and conceptualize an adaptive control loop analyzing the network’s behavior continuously and adjusting its time-varying internal representation to follow a desired target. We demonstrate how the added adaptivity of the network supports the computational functionality in three distinct tasks: interpolation of temporal patterns, stabilization against partial network degradation, and robustness against input distortion. Our results highlight the potential of adaptive networks in machine learning beyond training, enabling them to not only learn complex patterns but also dynamically adjust to changing environments, ultimately broadening their applicability.

Recurrent neural networks (RNNs) can capture complex temporal dependencies, making them an attractive choice for storing, retrieving, and predicting time series data. This ability renders them attractive models for solving machine learning tasks that require exploiting nonlinear and memory-based processing. During a training phase, the parameters of a RNN are adapted based on a learning algorithm, such as reservoir computing or backpropagation through time. Since the parameters of the network change in that phase, RNNs can be seen as adaptive networks during training. However, after the training, in the so-called inference phase, the parameters of the network are usually kept fixed. While this allows one to exploit the learned dynamical behavior, it often renders the network unable to adapt to new and changing conditions. In our work, we propose a novel mechanism that keeps certain parameters of the network adaptive even in the inference. In essence, conceptors control network dynamics by projecting them into a small subspace of network’s state space that can be interpreted as an ellipsoid. Based on the conceptor framework, we design a control mechanism based on the conceptor framework that regulate the RNN dynamics in an adaptive manner toward a predefined reference dynamic. To this end, we establish an adaptive control of the network dynamics, extending its functionality during inference and facilitating the enhanced abilities of RNNs. In particular, RNNs augmented with the conceptor control loop show improved temporal pattern interpolation ability, stabilization against partial network degradation, and robustness against input distortion.

## I. INTRODUCTION

Adaptive networks feature a flexible structure that evolves over time and might depend on the state of its internal nodes.^{1} Such adaptive networks find various applications in studying complex dynamical phenomena,^{2} modeling, e.g., power grids,^{3} learning of neural networks,^{4,5} and in the control of complex systems.^{6} Their adaptive nature allows such networks to dynamically adjust to evolving external conditions and internal dynamics. Such networks are particularly suited for applications requiring robustness against changing environments, with the added capacity to self-organize. Therefore, adaptivity refers to the ability to modify structure and function in response to new information, disturbances, or changing objectives. Such adaptability is achieved through mechanisms that enable the network to reconfigure its connections, adjust its parameters, or even alter its overall architecture in real-time or across different operational phases.

In the context of machine learning, artificial neural networks are applied to solve regression and classification tasks. In particular, recurrent neural networks (RNNs) have proven remarkably successful in predicting complex dynamics due to their inherent ability to capture temporal dependencies. Accordingly, RNNs have been applied to various tasks, such as time series prediction,^{7} language modeling,^{8} classification of dynamics,^{9} and control of complex systems.^{10} Beyond machine learning, RNNs can be seen as a general abstract model of network dynamics, which gave numerous dynamical insights in the natural science study of biological evolution, social networks, and brains.^{11} In this context, digital and discrete-time RNNs^{12,13} are used as well as analog continuous time networks implemented in various hardware can be employed.^{14–16} Relying on learning and optimization mechanisms, such as backpropagation through time^{17} and reservoir computing (RC),^{16,18} certain parts of the network are adapted during the training phase. This adaptation increases the network’s performance in the task at hand. Backpropagation through time employs gradient descent methods to update every parameter of the network. On the other hand, RC simplifies the training by only adapting a set of linear output weights. During RNN training, both methods make the network’s structure evolve over time to increase performance in a certain task, and hence, the network can be interpreted as adaptive. However, after the training, the parameters are usually kept fixed during inference to exploit the learned behavior. This prevents the RNN from continuing to adapt. Recently, various works introduced an extension to a reservoir computing framework that allows one to control the systems during the inference phase.^{19–21} Independently, these works proved RNNs effective in learning to forecast complex unseen dynamics given several differently parameterized examples from a dynamical system. By parameterizing certain parameters of the network, they demonstrate enhanced capabilities of RNNs allowing one to predict bifurcations, tipping points, and transient chaos. Furthermore, exploiting symmetries of delay-dynamical and spatiotemporal systems, adapted network structures facilitate to infer entire bifurcation diagrams unseen during training.^{22,23} However, similar to standard RC, the parameters of the RNNs are kept fixed rendering them often unable to cope with changing conditions, such as changing external signals, changing objectives, or degrading aspects of the network.

In this paper, we aim to overcome the unadaptive character of RNNs during the inference time by keeping parts of the inner network weights adaptive. We employ the conceptor framework and design an adaptive control loop that allows the RNNs to deal with unforeseen situations. As it will be later shown, conceptors characterize the input-driven dynamics of the networks. A conceptor can be geometrically interpreted as an ellipsoid encapsulating a trajectory given by the network’s state evolving over time. Furthermore, using the conceptor, the network dynamics can be controlled by projecting it into a certain subspace, thus restricting the potential states.^{24} To make RNNs adaptive, we design a conceptor-based control loop that estimates the network’s currently occupied subspace. We then use this subspace as a target to regulate the behavior of the RNN in the presence of unforeseen perturbations. By doing so, we control the RNN dynamics in a flexible way and can increase the network’s computational capacity, resulting in enhanced interpolation, robustness against partial network failure, and stabilization of input-driven dynamics.

This paper is structured as follows: In Sec. II, we show how to control RNNs using the conceptor framework. Subsequently, in Sec. II A, we employ the conceptor framework to design a control loop that renders the RNN adaptive even in the inference phase. In Sec. III, we demonstrate the extended functionality of the adaptive RNN for different scenarios, including interpolation and robustness. Finally, we demonstrate an advanced control setting for input distortion in Sec. IV. Section V concludes the results presented in this paper.

## II. CONTROLLING NETWORK DYNAMICS USING CONCEPTORS

^{18}is governed by the following discrete-time equation:

In the following, we drive the leaky RNN with different input sequences $u(k)$, and depending on the parameters of the network and the input signal $u(k)$, the nodes of the network start to oscillate. These nonlinear oscillations in the state space $x(k)$ can be seen as reverberations (similarly to how an object dropped into water causes ripples to spread out and reverberate long after the initial impact^{25}). In Fig. 1, we plot the reverberations of a RNN independently driven by three different sine waves represented in a low-dimensional projection using Principal Component Analysis (PCA). Therefore, PCA on the network’s state space x(k) finds the axes that capture the most of the variance in a decreasing order. In this context, the principal components can be viewed as the axis of an ellipsoid that describes the geometry of the reverberation in the network’s state space. Different characteristics of the input signal are encoded into certain trajectories within the network’s state space that can be distinguished via their encapsulating ellipsoids (see the shadowed regions in the right part of Fig. 1). Analyzing the eigenvalue spectrum of the principal components, therefore, exhibits that most of the variance of the RNN’s dynamics appears in a low-dimensional subspace of the network’s high-dimensional state space.

^{24}exploits the fact that the dynamics of driven and autonomous are often lying in a low-dimensional subspace. Additionally, when the dynamics are not low dimensional (the correlation matrix is full rank), filtering out the low-variance directions does not damage the rest of the system. The conceptor $C$ of an RNN is defined via the following objective function considering the state $x$ as a random variable:

*aperture*and $\Vert \u22c5 \Vert f r o$ indicates the Frobenius norm. Mathematically, the conceptor is a soft-projection matrix $C$ that minimizes the L2-distance between the projected state $Cx$ and the original state $x$ averaged over time. Using a regularization parameter $\gamma $, the second term controls how many dimensions within the network’s state space are considered for the projection. The objective function given in Eq. (2) has a unique analytical solution given by

^{24}

*autoconceptor*.

^{26}Within the autoconceptor framework, the conceptor is recursively updated at every time step $k$ by the following equation:

^{24}[Eq. (3)].

In general, the conceptor framework can be used to control RNNs and perform a variety of information processing tasks, such as denoising, multi-tasking, and associative memories.^{24} Furthermore, one can define multiple operations based on the conceptor matrices, such as a set of logical morphing, interpolation operations, and learning rules to learn incrementally a sequence of tasks. The latter was recently used to improve the continual learning abilities of deep feed-forward networks.^{27}

The conceptor framework enhances the functionality of RNNs, given the network dynamics defined by Eq. (4). However, the conceptor does not yet render the network adaptive since it is defined as a static projection. Such nonadaptive control renders systems often unstable and not robust against external perturbations. Therefore, in the next section, we propose an extension of the conceptor framework, yielding an adaptive control mechanism for RNNs. Subsequently, we compare computational capabilities of the nonadaptive conceptor only and the proposed adaptive conceptor control loop framework in three challenging tasks, namely, temporal pattern interpolation, network degradation, and input perturbation.

### A. Adaptive control of network dynamics using a conceptor control loop

In this section, we propose to use the system composed of the conceptor and RNN given by Eq. (4) together with a specifically designed control loop. This conceptor control loop (CCL) aims to balance the control of the dynamics toward predefined target dynamics and the dynamical stability of the network. The CCL is schematically depicted in Fig. 2 encompassing three main steps:

estimating the current conceptor $C(k)$ of the RNN in an online fashion from observing the sequence of states $x$,

pushing the estimated conceptor into the direction of the predefined target conceptor $ C t a r g e t$, and

using the linearly pushed conceptor $ C a d a p t$ to control the network.

## III. ADAPTIVE CONTROL OF AN AUTONOMOUS RNN

^{28}We report the optimized parameters in Table I. Finally, the weights are optimized to predict the input $u(k)$ of the RNN one step ahead by setting the prediction target to be $u(k+1)$. Since the output weights are linear, we can apply linear or ridge regression to find the optimal parameters $ W o u t$

^{12,16,18}giving the output $y= W o u tx$. After the training, the RNN is set into the so-called autonomous mode, running solely based on its own predictions. The dynamics of the RNN in the so-called

*autonomous mode*are governed by the following equation:

^{24}

Parameter name . | Symbol . | Interpolation . | Network degradation . | Input distortion . |
---|---|---|---|---|

Spectral radius | ρ | 1.6 | 0.749 | 0.9 |

Input scaling | ρ_{in} | 1 | 1.149 | 0.9 |

Bias scaling | ρ_{b} | 1 | 1.5 | 0.2 |

Leakage | α | 0.75 | 0.988 | 1.0 |

Ridge regularization | λ | 0.0001 | 1000 | 0.01 |

Aperture | γ | 25 | 31.6 | 8 |

Network size | N | 256 | 1500 | 50 |

Learning rate | η | 0.2 | 0.001 | 0.8 |

Control gain | β | 2.5 × 10^{−5} | 0.7 | 4 × 10^{−3} |

Random feature conceptor size | N_{RFC} | … | … | 200 |

Parameter name . | Symbol . | Interpolation . | Network degradation . | Input distortion . |
---|---|---|---|---|

Spectral radius | ρ | 1.6 | 0.749 | 0.9 |

Input scaling | ρ_{in} | 1 | 1.149 | 0.9 |

Bias scaling | ρ_{b} | 1 | 1.5 | 0.2 |

Leakage | α | 0.75 | 0.988 | 1.0 |

Ridge regularization | λ | 0.0001 | 1000 | 0.01 |

Aperture | γ | 25 | 31.6 | 8 |

Network size | N | 256 | 1500 | 50 |

Learning rate | η | 0.2 | 0.001 | 0.8 |

Control gain | β | 2.5 × 10^{−5} | 0.7 | 4 × 10^{−3} |

Random feature conceptor size | N_{RFC} | … | … | 200 |

### A. Interpolation between temporal patterns

*et al*.

^{29}The goal of this task is to generate an intermediate temporal pattern that continuously interpolates features, such as the frequency between two time series observed during a training phase. Here, we train an RNN on two sine waves with different frequencies as input data during the training phase. The sine waves are given from the model family as follows:

^{24}already outlines the ability to transition from one conceptor into another via linear interpolation of the conceptor matrices computed during the training,

The reasons for the breakdown of interpolation when using the static conceptor framework might be twofold. First, the interpolated conceptor enforces dynamics in a linear subspace of the network’s state space that was not observed during training. Hence, the dynamics within that subspace cannot directly be optimized to be stable. Furthermore, the output weights are trained only on the two initial time series, whereas we apply them to intermediate dynamics that can be far apart from the training dynamics. Second, the linear interpolation of the conceptor matrix introduced in Eq. (11) is known to shrink the eigenvalues of the resulting interpolated conceptor.^{30} This shrinkage of the eigenvalues, known for symmetric positive definite matrices, might lead to a loss of information and, in turn, might be one reason for the decreasing amplitudes of the intermediary solutions. As discussed by Feragen and Fuster,^{30} the shrinkage of the eigenvalues can be avoided by using another geodesic metric to derive the interpolation scheme, e.g., log-Euclidean, affine invariant, or the shape and orientation rotation metric. However, these metrics, due to their reliance on matrix exponentials, are computationally expensive and often numerically unstable.^{30}

We now proceed to test the ability of the adaptive CCL for interpolation between the two temporal patterns as carried out above. Therefore, we use the same network parameters (see Table I) as used for Figs. 3(a) and 3(b) and train the network on the same set of sine waves as above. After training, we apply the CCL to the network and scan the interpolation parameter $\lambda $ in the range of 0–1. In Figs. 3(c) and 3(d), we show the output of the network along this scan. We obtain that the network can generate an intermediary sine wave pattern for the three sine wave patterns $ T 1\u2208[25,30,35]$. In contrast to the conceptor-only framework, the adaptive CCL achieves a much more stable oscillation amplitude throughout all three interpolations. Additionally, the periods of the interpolated temporal pattern exhibit a smooth and constantly increasing transition from the starting frequency $ T 0$ toward $ T 1$ as shown in Fig. 3(d). Accordingly, the adaptive character of the CCL results in an interpolation much more stable and consistent interpolation than the static scheme.

Moreover, additional experiments show that the CCL can further boost the ability of the RNN to even interpolate between temporal patterns of more distant periods (Fig. 4). Whereas the conceptor-only framework breaks down to interpolate time series with periods $ T 0=20$ and $ T 1=35$, here, we demonstrate continuous interpolation of the CCL of periods from $ T 0=20$ up to more than twice the initial period at $ T 1=47.5$ avoiding fixed-point solutions in between. Accordingly, we argue that the proposed adaptive CCL improves the interpolation abilities of the RNN beyond the static conceptor-only framework by enhancing the generalization ability of the network without further training. The interpolation capabilities of the CCL eventually break down for a longer period of $ T 1=50$. In this case, an additional input example in the training set with a period in between $ T 0$ and $ T 1$ would extend the interpolation capabilities. We note that the CCL makes a better use of the input examples than the conceptor-only framework.

From a machine learning perspective where the objective is to generalize from a few samples, the CCL can be seen as a way to adaptively enforce a prior at inference time on how to generalize. The prior consists of assuming that intermediary samples are generated by dynamics with intermediate ellipsoids specified by the linear interpolation between the conceptor. With the CCL, we find that the interpolated dynamics generating the prediction lies in intermediary ellipsoids and facilitate a strong similarity with the ellipsoids elicited by the training data. In the case of the static conceptor where the interpolation leads to a fixed-point dynamic, the ellipsoid of the dynamic is reduce to a point. In this case, the prior is violated.

### B. Stabilization against partial network degradation

In this section, we evaluate the capabilities of the static conceptor only and the adaptive CCL to enhance the robustness of networks to partial degradation. Partial degradation refers to the process of removing neurons from the network in the inference while evaluating how well the network is able to continue performing the task it learned during the training. Historically, psychologists and neuroscientists have recognized that artificial and biological neural networks share a characteristic known as “graceful degradation.”^{31} This term describes how a system’s performance declines gradually as more neural units are compromised, without a singular, catastrophic failure point. Inspired by early interest in this phenomenon, numerous engineering strategies have been developed to leverage and augment the inherent fault tolerance in ANNs.^{32} Most of these strategies lack the ability to adapt dynamically. Examples of such strategies include the injection of noise into a neuron activity during training, the addition of regularization terms to improve robustness, or the design of networks with a built-in redundancy by duplicating important neurons and/or synapses. However, research on improving graceful degradation within RNNs, especially those handling dynamic inputs, remains sparse.^{32}

To study the influence of degradation within RNNs, here, we train the output weights of a network in one-step-ahead prediction. We then feed the prediction back to generate a pattern autonomously as defined by Eq. (9). As the target time series, here, we use human motion capture data from the MoCap data set.^{33} The corresponding time series provides the data of 94 sensors attached to a human body sampling the movement of joints and limbs in a three-dimensional space. For the training, we use 161-time steps that include two periods of a running behavior as shown in Fig. 5. After optimizing^{34} the weights and parameters of the RNN for the autonomous continuation (see Table I), we start to degrade the network by removing $R$ randomly selected neurons from the network. More specifically, we clamp the corresponding state-coordinate $x$ of the selected neuron to zero while computing the network’s autonomous continuation. Finally, we analyze the ability of the network to continue its autonomous prediction despite the partial degradation by comparing the mean square error (MSE) of its output compared to the original time series. In our implementation, we use an RNN of $N=1500$ neurons and first evaluate the robustness of the static conceptor-only framework by degrading $R=200$ randomly selected neurons. In Fig. 5(a), we plot the three first Principal Components (PCs) of the autonomously evolving RNN in red. The projection of the evolution of the RNN without degradation is shown for comparison (green color). Similarly to the task of interpolation as described in Sec. III A, the applied degradation pushes the system outside the dynamical regime it was trained on. In analogy to the interpolation results, we obtain that the RNN with the applied static conceptor tends to fall into fixed-point dynamics (red curve) rendering the network unable to continue its learned dynamics. In this case, the stickman, whose movements mirror the time series data, becomes unnaturally still, as illustrated in Fig. 5(b). With the degradation, the network’s output does not produce a running motion, leaving the stickman frozen.

We now proceed to evaluate the graceful degradation abilities of the RNN while applying the CCL that we slightly adapted for this challenging task (see Appendix A). As shown in Fig. 5, adding the CCL stabilizes the system around a limit cycle after a short transient (blue curve). Across different experiments with a growing number $K$ of degraded neurons, we observe that the CCL consistently stabilizes the system against partial degradation. This adaptation enables the system to maintain its dynamics and qualitatively preserve the limit cycle [Fig. 6(a)]. From visual inspection, we can also see that the CCL is able to preserve appealing properties of running as shown in Fig. 5(b). Note that the recovery from the degradation is only approximate, and the stickman from the CCL system is slightly slowed down compared to the baseline (green), which achieves one full cycle in a shorter time. Additional examples of the reconstructed dynamics with and without a control loop are shown in Appendix B.

In our quantitative evaluation, we assessed the continuation capabilities of the non-failing trials by comparing the time series from the baseline (non-degraded RNN) with the phase-aligned output from the degraded RNN using a Normalized Root Mean Squared Error (NRMSE). The results show that the average NRMSE is consistently lower for the system incorporating the CCL [Fig. 6(b)]. Overall, both quantitative and qualitative results highlight that the adaptivity endowed by the CCL improves the robustness of the RNN against network degradation. Here, we show that the CCL is a straightforward, yet effective enhancement allowing for graceful degradation of internal neurons.

## IV. ADAPTIVE CONTROL OF AN INPUT-DRIVEN RNN

In Sec. III, we set up RNNs to autonomously continue certain temporal patterns and evaluated their performance in two tasks. To investigate the robustness and adaptability of our approach, we evaluate the capacity of an input-driven RNN to continue to perform its task when the input is subject to distortions. We aim to achieve that the network adapts such that it equalizes an input signal that is reduced in a signal amplitude compared to the training period. Despite that the chosen distortion is a linear transformation of the input signal, the RNN response to this change is nonlinear. In signal processing terms, we are looking for a mechanism of adaptive blind nonlinear channel equalization.^{35} Our work aims to embed further adaptability properties within RNNs and, thus, get closer to the robustness found in biological neural networks.

### A. Extending the conceptor control loop for non-autonomous input-driven systems

Adding adaptation mechanisms to input-driven systems that are processing information is a challenging task. Changes to the input can push the system away from performing the desired task. If the distortion is sufficiently large, it might act as a signal from an unseen dynamical system coupled to the RNN. Our task is similar to the challenge of regulating coupled autonomous systems for which only a fraction of the entire system is accessible for observation and control. To address the adaptive control of input-driven RNNs, we extend the Conceptor Control Loop (CCL) in two ways: stacking multiple RNNs and their associated CCLs into hierarchical architecture and using a more computationally efficient version of the conceptor called the *random feature conceptor*.

#### 1. Hierarchical architecture

We introduce hierarchical architecture to handle distorted inputs. A single CCL cannot manage this task as an iterative process is required, which cannot be done in a single RNN that is continuously driven by the input. The aim of the hierarchical architecture is that, by progressing through the hierarchy, each layer will further correct the signal distortion. This architecture consists of multiple identical layers of RNNs with their associated CCLs, as illustrated in Fig. 7. Each layer contains identical RNNs with the same parameters. The readout $ W o u t$ is trained, as before, via linear regression for a next-step prediction. The conceptor is measured on the training (undistorted) time series. In a hierarchical structure of $L$ layers, the lowest layer is connected to the input, while the topmost layer provides the global system’s output.

#### 2. Random feature conceptors

*random feature conceptors*,

^{24}a modified version of conceptors that randomly and linearly expands the state of the RNN $x(k)$ to higher dimensions $z(k)$ (random features), where a vector-based conceptor $ c a d a p t(k)$ is applied via an element-wise multiplication instead of a matrix multiplication. This results in the following equations:

*Random feature conceptors* are approximations of matrix-based conceptors while having various computational improvements.^{24} For our application, the main advantage is that the CCL converges much faster. We provide more details about *random feature conceptors* in Appendix C.

### B. Increasing robustness against input distortions

In our adaptation task toward input signal distortions, the system learns the prediction target from a single presentation of the undistorted input time series. In the inference phase, the system should counteract a distortion applied to the input series not present during the training. The underlying idea is that the CCL of each layer pushes the dynamics toward its reference linear subspace via the unperturbed signal’s reference conceptor. This push makes each layer transform its input into an output closer to the unperturbed signal.

To demonstrate the validity of the suggested hierarchical architecture, we studied a relatively simple four-layer RNN subjected to a 0.3 downscaling distortion on a periodic time series composed of two sine waves with periods of 7 and 21. As shown in Fig. 8, the hierarchically arranged system progressively eliminates the distortion, reducing the prediction error layer by layer. The reconstruction loss, characterized by the NRMSE, shows a consistent decrease across the hierarchy as presented in Fig. 8(a). We observe a clear improvement of the accuracy in the time series reconstruction by the third layer in Fig. 8(d), compared to the input-connected initial layer in Fig. 8(b).

We also verify the CCL’s causal role in the signal recovery by studying the action of the hierarchical system without the CCL. The inability of the hierarchical system without the CCL to correct distortions is illustrated by the orange curve in Fig. 8(a). This result further underlines the CCL’s computational capabilities in controlling the hierarchical architectures and handling novel data during inference.

To analyze in more detail the functioning of the CCL, we monitor in time the CCL dynamics as depicted in Fig. 8(c). In the first part for time $k<1000$, we observe the CCL dynamics without feedback [ $\beta =0$ in Eq. (C6)]. At time $k=1000$, the CCL converged to a state with a large distance to $ C t a r g e t$, as expected for a distorted input that significantly differs from the one used during training. Accordingly, the reconstruction of the input has a large error, corresponding to the orange curve in Fig. 8(a). After time $k=1000$ (dotted vertical line), the feedback is activated with $\beta =0.7$ in Eq. (C6). The conceptor $ C a d a p t$ of each layer is then further pushed toward the target conceptor. The progressive reduction in reconstruction error, corresponding to the blue curve in Fig. 8(a), demonstrates the fact that the higher layers get their conceptor $ C a d a p t$ closer to the target conceptor. Hence, the hierarchy uniquely allows the progressive compensation of the input attenuation.

Interestingly, the random feature conceptor reduced the time of convergence. In Fig. 8(c), the CCL converges in less than 500 steps, while the normal conceptor CCL typically requires more than 10 000 steps to converge. This is because random feature conceptors can be used with a larger learning rate, see Table I for a comparison of the parameters used for each task. More details on the properties of the random feature conceptor can be found in Jaeger.^{24}

## V. CONCLUSION

In this paper, we introduced an adaptive control loop based on the conceptor framework to augment the learning capabilities of RNNs. We demonstrated that by controlling RNN dynamics during inference, we achieve several significant benefits. First, our method improves the generalization abilities of RNNs, enabling the interpolation of more distinct temporal patterns. Second, it enhances graceful degradation, making RNNs more robust to errors or unexpected conditions. Finally, hierarchical RNN structures with an adaptive control loop exhibit improved signal reconstruction capabilities.

Our findings suggest that maintaining adaptive elements within a network post-training can significantly expand the potential applications of RNNs. This approach has particular promise for enabling RNNs to operate reliably in challenging environments or adapt to novel tasks with minimal additional training (few-shot learning).

The increased robustness demonstrated by our adaptive RNNs has intriguing parallels across multiple disciplines. In cognitive science, studies of subjects using inverting goggles have revealed remarkable human adaptability to significant perceptual shifts, suggesting specialized mechanisms for managing perturbations in dynamical visuomotor coordination.^{36} Similarly, in cybernetics and signal processing, techniques for recovering intentionally hidden or distorted signals are required, e.g., in secure communications.^{37} These connections underscore the broader relevance of understanding and enhancing adaptive capabilities in complex systems.

While our study focused on recurrent networks, the conceptor framework has the potential to control and augment various dynamical systems, including feedforward and other artificial neural networks. Therefore, our work represents a starting point for broader exploration into the control and enhancement of nonlinear computing systems.

## SUPPLEMENTARY MATERIAL

See the supplementary material for this study that includes videos that demonstrate the effects of the Conceptor Control Loop (CCL) on RNN behavior. Available videos are “running.gif” for the CMU 016 55 MoCap data series, “runningCcl7.gif” showing the seventh example from Fig. 11(c) with CCL, and “runningNoCcl7.gif” without CCL from Fig. 11(b). These visuals highlight the RNN’s adaptive performance under various conditions, supporting our findings on the benefits of incorporating adaptivity into RNNs.

## ACKNOWLEDGMENTS

The authors acknowledge the support of the Spanish State Research Agency, through the Severo Ochoa and María de Maeztu Program for Centers and Units of Excellence in R&D (No. CEX2021-001164-M), the INFOLANET projects (No. PID2022-139409NB-I00) funded by the MCIN/AEI/10.13039/501100011033, and through the QUARESC project (Nos. PID2019-109094GB-C21 and -C22/AEI/10.13039/501100011033) and the DECAPH project (Nos. PID2019-111537GB-C21 and -C22/AEI/10.13039/501100011033). All authors acknowledge financial support by the European Union’s Horizon 2020 research and innovation program under the Marie Skłodowska-Curie Grant Agreement No. 860360 (POST DIGITAL).

## AUTHOR DECLARATIONS

### Conflict of Interest

The authors have no conflicts to disclose.

### Author Contributions

G.P. and M.G. contributed equally to this paper.

**Guillaume Pourcel:** Conceptualization (equal); Formal analysis (equal); Investigation (equal); Methodology (equal); Project administration (equal); Resources (equal); Software (equal); Visualization (equal); Writing – original draft (equal); Writing – review & editing (equal). **Mirko Goldmann:** Conceptualization (equal); Formal analysis (equal); Investigation (equal); Methodology (equal); Project administration (equal); Resources (equal); Software (equal); Visualization (equal); Writing – original draft (equal); Writing – review & editing (equal). **Ingo Fischer:** Funding acquisition (equal); Supervision (equal); Writing – review & editing (equal). **Miguel C. Soriano:** Funding acquisition (equal); Supervision (equal); Writing – review & editing (equal).

## DATA AVAILABILITY

The data that support the findings of this study are available from the corresponding author upon reasonable request.

### APPENDIX A: NETWORK DEGRADATION: CCL ADAPTATION AND FURTHER RESULTS

More precisely, we found that the CCL in the main text [Eq. (8)] that keeps $C$ and $ C a d a p t$ separated (S-CCL) is less robust than the mixed CCL (M-CCL) presented above. When using the same gain $\beta =0.7$, the S-CCL does not yield any improvement with respect to the baseline. Only after reducing the gain to $\beta =0.3$ were we able to match the performance of the M-CCL (Fig. 9). Moreover, the S-CCL is less robust to the initial conditions. When initializing the CCL with the identity matrix (as opposed to $ C t a r g e t$ in the main text), the S-CCL performance degrades to the level of the baseline (Fig. 10).

^{24}

### APPENDIX B: NETWORK DEGRADATION: ADDITIONAL EXAMPLES

In Figs. 11(b) and 11(c), we present further results of the individual trials with different random perturbations of Fig. 5(b) and for reference the case where the RNN works without degradation [Fig. 11(a)]. We also provide videos in the supplementary material of the running time series CMU 016 55 of the MoCap data (running.gif), and the seventh example of Fig. 11 with (runningCcl7.gif) and without (runningNoCcl7.gif) the CCL. The time series were pre-processed as described by Jaeger.^{26}

### APPENDIX C: RANDOM FEATURE CONCEPTOR

^{24}to enhance the system’s computational efficiency. This was achieved by adding an expansion in high dimension $z$ where the matrix-based computation of conceptors can be replaced by a vector-based analog. Other than for computational efficiency on a digital computer, this method is more bio-plausible and can be more easily be implemented in neuromorphic hardware.

^{24}The equation of the RNN, which we repeat here for clarity, is then

^{24}Hence, RFC can still be interpreted as a computational trick with the same semantic of projecting the dynamic of $x$. The CCL is now applicable in a vector form,

^{24}(see Table I).

## REFERENCES

*Proceedings of the 28th International Conference on Machine Learning (ICML-11)*(Omnipress, 2011), pp. 1017–1024.

*Reservoir Computing: Theory, Physical Implementations, and Applications*, Natural Computing Series, edited by K. Nakajima and I. Fischer (Springer, Singapore, 2021).

*Advances in Artificial Life*, edited by W. Banzhaf, J. Ziegler, T. Christaller, P. Dittrich, and J. T. Kim (Springer, Berlin, 2003), pp. 588–597.

*International Conference on Learning Representations*(Proceedings of Machine Learning Research, 2018).

*Modeling, Analysis, and Visualization of Anisotropy*, edited by T. Schultz, E. Özarslan, and I. Hotz (Springer International Publishing, Cham, 2017), pp. 85–113.

*Parallel Distributed Processing, Volume 2: Explorations in the Microstructure of Cognition: Psychological and Biological Models*

*Adaptive Filters: Theory and Applications*