We demonstrate the utility of machine learning in the separation of superimposed chaotic signals using a technique called reservoir computing. We assume no knowledge of the dynamical equations that produce the signals and require only training data consisting of finite-time samples of the component signals. We test our method on signals that are formed as linear combinations of signals from two Lorenz systems with different parameters. Comparing our nonlinear method with the optimal linear solution to the separation problem, the Wiener filter, we find that our method significantly outperforms the Wiener filter in all the scenarios we study. Furthermore, this difference is particularly striking when the component signals have similar frequency spectra. Indeed, our method works well when the component frequency spectra are indistinguishable—a case where a Wiener filter performs essentially no separation.
Separation of a signal composed of a linear superposition of independent signals into component parts, without knowledge of the underlying systems that generate them, is an important problem with wide applications in signal processing and data analysis. Machine learning has been successful in various applications involving temporal signals through the use of recurrent neural networks, though these networks are generally more difficult to train than the feedforward networks used in deep learning. In this article, we present a method for separating signals using a machine learning paradigm called reservoir computing, in which the training is relatively simple. We test our method on chaotic signals, whose broad frequency spectrum presents a challenge for classical separation methods. Our results suggest that the method is effective and widely applicable to signal extraction in complex geological, biological, and economic systems that are difficult to model.
I. INTRODUCTION
The problem of extracting a signal from ambient noise has had wide applications in various fields such as signal processing, weather analysis,1 medical imaging,2 and cryptography.3 We consider the related (but potentially more difficult) problem of separating two or more similar signals that are linearly combined. This is a version of the cocktail party problem, i.e., how do people at a cocktail party separate out and focus on the voice of the person they are interested in listening to from a combination of several voices that reach their ear. In our version of the problem, signals are generated by chaotic processes. If the equations governing these processes are known, the problem can be attacked, for example, by using chaos synchronization.4–7 We consider instead an approach that relies only on data. Our problem is similar to that of blind source separation for chaotic signals,8–10 but the latter problem typically assumes that multiple linear combinations of the signals are being measured, with the number of independent measurements at least as large as the number of signals. In contrast, our method requires only one linear combination of the signals to be measured after training is complete. For training, we assume that we have finite-time samples of the separate component signals, and our method learns from these samples to separate subsequently measured combination signals into their components. We also note that Wiener obtained an optimal linear solution for the signal separation problem (the “Wiener filter”). In contrast, the technique we use is fundamentally nonlinear and, as we will show, can significantly outperform the linear technique.
Machine learning techniques have been very successful in a variety of tasks such as image classification, video prediction, voice detection, etc.11–13 Recent work in speech separation includes supervised machine learning techniques such as deep learning,14 support vector machines,15 as well as unsupervised methods such as non-negative matrix factorization.16
Our approach is based on a recurrent machine learning architecture originally known as echo state networks (originally proposed in the field of machine learning)17 and liquid state machines (originally proposed in the field of computational neuroscience),18 but now commonly referred to as reservoir computing.19 Reservoir computing has been applied to several real-world problems such as prediction of chaotic signals,20 time-series analysis,21 similarity learning,22 electrocardiogram classification,23 short-term weather forecasting,24 etc. Although our demonstrations in this paper are for reservoir computing, we expect similar results could be obtained with other types of machine learning using recurrent neural network architectures, such as long short-term memory (LSTM) and gated recurrent units;25 however, based on the results in Ref. 25, we expect that these architectures will have a significantly higher computational cost for training than reservoir computing in the cases we consider.
We train a Reservoir Computer (RC) as follows. Training data consist of an input time series and a desired output time series. The RC consists of the input layer, the reservoir, and the output layer. The input time series is processed by the input layer as well as the reservoir; the resulting reservoir states are recorded as a vector time series. Then, linear regression is used to find an output weight matrix (the output layer) that fits the reservoir state to the desired output. The internal parameters of the input layer and the reservoir are not adjusted to the training data, only the output weights are trained.
In this article, we input a linear combination of two chaotic signals to the RC and train it to output an estimate of one of the signals. In the simplest case, which we describe in Sec. III, the ratio between the amplitudes of the signals is known in advance. In Sec. IV, we consider the case in which this ratio is unknown. In this case, we first train a RC to estimate this ratio which we call the mixing fraction and then train another RC to separate the chaotic signals given the ratio. We demonstrate our results on signals from the Lorenz system in the chaotic regime.
We describe our implementation of reservoir computing in Sec. II A and review the Lorenz system in Sec. II B. We compare our results with an approximation to the Wiener filter, computed by estimating the power spectra of the signals using the same training data we use for the RC. If computed from the exact spectra, the Wiener filter is the optimal linear separator for uncorrelated signals (see the Appendix). Motivated in part by this comparison, we consider three scenarios is Sec. III: separating signals with different evolution speeds on the same attractor, signals with different Lorenz parameters, and signals with both the parameters and evolution speed perturbed in such a way that the spectra of the signals almost match. We present our conclusions in Sec. V, a brief summary of which are as follows: (1) the RC is a robust and computationally inexpensive chaotic signal separation tool that outperforms the Wiener filter, and (2) we use a dual-reservoir computer mechanism to enable separation even when the amplitude ratio of the component signals in the mixed signal is unknown. Here, the first RC estimates a constant parameter, the mixing fraction (or amplitude ratio), whereas the second RC separates signals given an estimated mixing fraction.
II. METHODS
We consider the problem of estimating two scalar signals and from their weighted sum . We normalize , , and to have mean 0 and variance 1. We assume and to be uncorrelated, in which case our normalization implies, . Let , so that
We call the mixing fraction; more precisely, it is the ratio of the variance of the first component of to the variance of .
In Sec. III, we assume that the value of is known. In Sec. IV, we consider the case where is unknown. In both sections, we assume that limited-time samples of and are available, say, for , and we use these samples to train the reservoir. Our methods require no knowledge of the processes that generate and , but we assume that these processes are stationary enough that the training samples are representative of the components of future instances of . For our numerical experiments, we generate and from the Lorenz family of chaotic dynamical systems (see Sec. II B). More generally, the same method can be used if is a combination of multiple signals and/or noise.
A. Reservoir computer
There are many variations in implementation; in this paper, we adopt the echo state network approach of reservoir computing proposed by Jaeger.17 The reservoir computer has three components (Fig. 1), a linear input layer with scalar input [one for each component of the M-dimensional input signal ], a recurrent, nonlinear reservoir network with dynamical reservoir nodes driven both by inputs as well as by delayed feedbacks from the other reservoir nodes, and a linear output layer with scalar outputs, as shown in Fig. 1. We describe a method for general and , but in our experiments, we will always use , with equal to either or , or (in Sec. IV), the constant .
1. Input layer
The input layer is described by a matrix where elements are randomly chosen between to be a uniform distribution between , where is a hyper-parameter to be chosen later.
2. Reservoir layer
The reservoir can be thought of as a dynamical system with state vector at time given by
The state of the reservoir at time is given by , of size . The notation with a vector argument is defined as the vector whose components are the hyperbolic tangents of the corresponding components of the argument vector. The leakage parameter , which is bounded between , determines the speed at which the input affects or leaks into the reservoir. Both and the bias magnitude are hyper-parameters. The recurrent connection weights are initialized randomly between and ; then, is normalized by a multiplication of all its components by a constant chosen so that the spectral radius (maximal absolute eigenvalue of ) , which is another hyper-parameter. A discussion of the effect of spectral radius on performance is presented in Ref. 26. Typically, is much larger than so that the reservoir transforms the input from the input space into a much higher dimensional reservoir space. The sparsity of the reservoir layer is a measure of how sparsely connected the nodes are. Sparsity of zero means all-to-all coupling, and sparsity of one means no connections at all.
3. Output layer
After running the reservoir [Eq. (3)] for a transient time period to , we form the reservoir state trajectory, , is formed by concatenating the reservoir state vectors (the state of all reservoir nodes) at every time step corresponding to the input as follows:
Thus, is an augmented matrix of size where is the training time, i.e., number of time steps for which training data is available. During training, the output weights are found by mapping the reservoir state trajectory to the desired output layer representation over T samples. Only the weight matrix is optimized during training such that the mean square error between the output of the reservoir and the target signal is minimized. The reservoir output is obtained through the output weights as follows:
where is the dimensionality of the readout layer.
Ridge Regression (or Tikhonov regularization)27 is used for fitting, thus making the system robust to overfitting and noise. Ridge regression minimizes squared error while regularizing the Euclidian norm of the weights as follows:
where is a regularization constant, which is also a hyper-parameter and is chosen to be except as noted below.
Once training is done, the RC predicts state variables for times through and Eq. (4).
B. Data: Lorenz system
Our examples are based on the Lorenz equations,28
The Lorenz attractor is a strange attractor that arises in these system of equations. The Lorenz system is known to be chaotic for the parameter values: , , and 29 and appears to be chaotic for other parameter values we use in this article. We generate trajectories using the fourth order Runge-Kutta method with step size 0.01, and we sample them every five steps, i.e., 0.05 time units.
III. RESULTS: GENERALIZATION SEPARATION WITH KNOWN MIXING FRACTION
Recall that we assume finite-time samples of the component signals and are available for training. In this section, we assume the mixing fraction to be known a priori. We form the training input signal according to Eq. (1) using the known value of . We set the desired output signal equal to .
We calculate the error between the trained reservoir output and the desired signal as follows. The mean square reservoir error is
where , i.e., one of the components of the signal (typically we use ). scales the input so that the denominator indicates the root mean square error in the absence of any processing by the reservoir.
The Wiener filter30 is known to be the best possible linear solution to our signal separation problem and is also commonly used in tasks such as signal denoising, image deblurring, etc. A detailed explanation of the Wiener filter is given in the Appendix. In this article, we always compare the RC with a Wiener filter, which uses a window of for estimating the spectrum. The mean square Wiener error measure is
where is the output of the Wiener filter.
A. Separating Lorenz signals with different parameters
In this section, the reservoir input consists of a combination of two x component signals from Lorenz systems with different parameter values. The signal has parameters . The signal has parameters .
Several parameters related to the reservoir system must be chosen for running experiments. These parameters include reservoir size , spectral radius , leakage parameter , and the length of training signal. In Fig. 2, we vary each of these parameters individually while keeping the others constant in order to identify an appropriate set of values.
Figure 2 shows the performance of the reservoir in separating the component of the trajectory for varying parameters given . The reservoir errors are averaged over 10 random initializations of the reservoir. The error bars denote standard error, i.e., the standard deviation across random initializations over . We observe a downward trend in error with increasing reservoir size, which seems to saturate at about nodes. Panel (a) shows that a reservoir size of gives a reasonable trade-off between performance and computational efficiency. Panel (b) shows that seems to result in the best separation, with all other parameters constant. As seen in panels (c) and (d), the optimal leakage parameter and training length are and , respectively. Other parameters chosen are input strength normalization such that the input does not saturate the tanh curve. These parameter values are used for the rest of this paper unless mentioned otherwise.
Figure 3(a) present the performance of the reservoir as a function of the mixing fraction. We observe that the reservoir computer consistently outperforms the Wiener filter. However, as increases, the normalized error increases as well. This can be attributed to the denominator in [see (9)] tending to zero as tends to one. The non-normalized error is plotted in Fig. 3(b), and here we can see that as tends to one, error tends to zero, as it should. Figure 3(c) shows the estimated separated trajectory along with the actual in the testing phase, and the actual and trajectories. The top panel demonstrates that the RC prediction does indeed match the actual accurately. The middle panel demonstrates that the best linear filter, the Wiener filter, is comparatively worse than the RC at estimating the chaotic signal .
B. Separating Lorenz signals with different speed
In this section, the reservoir input consists of a combination of two signals from the component of the Lorenz system with the same parameter values , but with different speeds, i.e., the right hand side of Eq. (8) for the Lorenz system corresponding to is multiplied by a speed fraction (ratio of speeds) . The reservoir parameters remain the same as those found in Sec. III A.
Figures 4(a) and 4(b) present the performance of the reservoir computer as a function of the mixing fraction for , respectively, where is the speed of the signal. The RC consistently outperforms the Wiener filter. As seen in Fig. 4(c), the Wiener prediction is much poorer than the reservoir prediction of the chaotic signal .
We have also investigated the case of identical parameter systems, with identical speeds. We find that when the mixing fraction is , the symmetry between the components and prevents separation. However, as shown in Fig. 5, as deviates from , the RC succeeds in reducing the normalized error significantly below .
C. Separating Lorenz signals with matched power spectra
Linear signal denoising/separation methods are based on difference in spectra. Here we study the component of two signals, and with and , where , and . Figure 6(a) shows a log plot of the Power Spectral Density (PSD) and of the two signals and , respectively, as a function of frequency . The PSD of the component of the Lorenz system has a distinct peak (unlike the components), followed by a much smaller peak. For this reason, in Fig. 6(a), we plot the PSD of the signals to demonstrate spectra matching even though we separate the corresponding signals (which may conceivably be a harder problem to solve since the switches between positive and negative sides chaotically, and the RC has to learn when to switch correctly). We observe, in panel (a), that the peaks do indeed line up and the spectra match fairly well. Figure 6(b) plots the reservoir and Wiener errors as a function of for the spectra matched case. For a case where the spectra of the two individuals are indistinguishable, a spectra-based filter such as the Wiener filter will be unable to separate the signals. Figures 6(c) and 6(d) show the reservoir predictions for the and component of the Lorenz system, respectively, for . We observe that the RC is indeed able to separate the signals, even when their spectra match, unlike other state-of-the-art spectra-based signal separation methods like the Wiener filter (see the Appendix).
IV. GENERALIZATION SEPARATION WITH UNKNOWN MIXING FRACTION
Often, as in the case of the cocktail party problem, the ratio of amplitudes in a mixed signal may not be known. In this section, we describe a methodology to separate signals without knowledge of the mixing fraction . As before, we assume that we have access to training data for the individual signals and . The simplest approach would be to form training data consisting of mixed signals [Eq. (1)] with a variety of values between and , and train an RC to output ; the output weight matrix is then independent of , unlike in Sec. III. However, we found this approach to be unsuccessful.
Instead, we use two RCs in sequence as follows. The first RC is trained as described in the previous paragraph, except that the desired output is instead of . Once trained, we use the first RC to estimate the value of for a given mixed signal, as we describe in Sec. IV A. Then, we input the same mixed signal to a second RC, which is trained as in Sec. III to output . The second RC can be trained after the first RC has run, using the estimated value of . Or, the output weight matrix for the second RC can be interpolated from output weight matrices that have been pretrained on discrete values of , as we discuss in Sec. IV B.
A. Estimation of the mixing fraction parameter
In training, the first RC is given mixed signals over a range of discrete values of (0–1 in intervals of 0.05) and trained to output a constant value. In testing, the constant predicted ’s are averaged over the testing time. We find that the trained RC always has a tendency to predict closer to mean (0.5) when fit on both the training and the test data, i.e., overestimate small actual and underestimate large actual . To correct for this tendency, we use the training dataset to construct a mapping from the predicted to the desired via a fit to a third-order polynomial function. We then apply this same function to the reservoir-predicted value(s) for the test data, in order to obtain corrected values.
Figure 7 illustrates the performance of our approach for separating a mixed signal of x-components of two Lorenz systems with parameter values that differ by 20%. Figure 7 (top) shows the reservoir-estimated vs the actual and Fig. 7 (bottom) shows the corrected estimate [calculated using a third-order polynomial fit to the points in 7 (top)]. After making the correction, our method accurately predicts for both the training and test data. In order to separate signals with a certain value of mixing fraction, a reservoir needs to be trained explicitly for separation with that mixing fraction (within limits of interpolation discussed in Sec. IV B). Thus, obtaining an estimate of the mixing fraction from the first RC allows us to train a second RC for separation of signals with that mixing fraction.
We note also that similar to this method for estimating the mixing fraction parameter, RC along with output correction can be used for parameter estimation from data in other systems as well. It is possible also that other choices for the reservoir dynamics, such as a Liquid State Machine,18 would be fruitful for parameter estimation.
B. Interpolating between trained reservoir computers
As mentioned above, having estimated the value of , we can subsequently train the second RC using the estimated . However, if many different values of will be encountered, retraining for each value might be unsatisfactorily inefficient. In such cases, RCs can be pretrained on discrete values of . Any intermediate predicted estimate obtained can then be used for separation by interpolating between the two nearest trained RCs. Here, interpolating between RCs is equivalent to interpolating between their trained output weights. What spacing of is appropriate for training? A mixed signal with predicted can be separated by using the output weight matrix if is in between two discrete trained values of ( and ),
In Fig. 8(a), we plot the reservoir error for in orange, and for the average of the matrices of reservoirs trained on . We notice that interpolating between discrete RCs with a spacing of is well within the range of negligible error compared to training on individual predicted s. Hence, one only needs to train on ’s in intervals of 0.1 (. Figure 8(b) shows the reservoir error for RCs trained individually on compared with the reservoir error obtained by averaging the matrices of RCs trained on appropriately as in Eq. (11). We observe that the results practically coincide and that the method is fairly robust to errors in prediction.
Successful interpolation between RCs trained on discrete mixing fractions drastically reduces the training time and computation without compromising on quality of signal separation. In fact, we only need to train on 11 distinct mixing fractions to be able to separate Lorenz signals mixed in any proportion. Thus, we are able to generalize our chaotic signal separation technique to cases where the mixing fraction is unknown.
V. CONCLUSION AND FUTURE WORK
In this article, we used a type of machine learning called reservoir computing for separation of chaotic signals. We demonstrated the ability to separate signals for several cases where the two signals are obtained from Lorenz systems with different parameters. We compared our results with the Wiener filter, which is the optimal linear filter and whose coefficients can be computed from the power spectra of the signals (see the Appendix). Spectra-based methods, naturally, perform poorly at signal separation if the spectra of the signals to be separated are indistinguishable. By contrast, the RC performs reasonably well even when the two signals that are mixed have very similar spectra. Our results were significantly better than the Wiener filter calibrated from the same training data for all the scenarios we considered.
Often, in signal separation applications such as the cocktail party problem, the mixing fraction (amplitude ratio) of the signals to be separated is unknown. We introduce a RC-based way to separate signals with an unknown parameter, in our case the mixing fraction. The first step of this generalized method is estimating a mixing fraction for a given signal. Estimation of a constant valued parameter from a temporal signal is a problem of broad interest, with applications such as weather prediction, predicting parameters of flow, equation modeling, etc. We find that after time-averaging its trained output, the RC tends to skew the estimated parameter toward the mean of the parameter values used during training. By fitting a mapping function that corrects the averaged output for the training data, we are also able to approximately correct the test output. This method of introducing an additional nonlinear “correction” to the learned output weights may be useful for predicting constant outputs in other dynamic machine learning systems.
In some cases, training on a wide range of mixing fractions may not be possible due to the need for quick separation of signals and limited training capacity (e.g., in hardware applications). Hence, we study the robustness of the RC and the ability to use interpolated RCs pretrained at discrete mixing fractions. We demonstrate that the RCs need only be trained at a coarse grid of mixing fractions in order to accurately separate Lorenz signals with arbitrary mixing fractions. Our results are robust to errors in the prediction of mixing fraction. Hence, in situations where computational resources are constrained, RCs can be trained on discrete mixing fractions, and interpolation between these RCs can be used to accurately separate chaotic signals for intermediate values of mixing fraction.
Here, we have demonstrated the ability of reservoir computing to act as an efficient and robust method for separation of chaotic signals. The dynamical properties of the reservoir make it a prime candidate for further exploration of chaotic signal processing. An interesting future direction might be to study alternate RC architectures such as parallel RCs for more complex signal extraction problems.
ACKNOWLEDGMENTS
This research was supported through a DoD contract under the Laboratory of Telecommunication Sciences Partnership with the University of Maryland and, in part, by National Science Foundation (NSF) (Award No. DGE-1632976).
APPENDIX: WIENER FILTER
The Wiener filter30 was designed to separate a signal from noise, but it can be used to (imperfectly) separate any two signals with different power spectra. For simplicity, we formulate the filter here in continuous time, as a linear noncausal filter with infinite impulse response (IIR) for a scalar signal. Later, we describe how we compute the filter in practice, in discrete time with finite impulse response.
Let be the combined signal—the input to the filter—and let be the component signal, that is, the desired output of the filter.
A noncausal IIR filter can be written as a convolution
where is the impulse response function of the filter [we assume that has finite integral; then, if is bounded, so is ]. Of all such filters, the Wiener filter is the one that minimizes the mean square error between the filter output and the desired output.
Similar to linear least squares in finite dimensions, the minimizing function can be related to the autocovariance and cross-covariance functions
Specifically, setting the first variation (i.e., the first derivative in the calculus of variations) of the mean-square error equal to zero yields
Equation (A3) can be solved in the frequency domain, where convolution becomes multiplication. Let , , and be the Fourier transforms of , , and , respectively; then, , and
We interpret Eq. (A4) based on the fact that and are, respectively, the power spectral density of and the cross spectral density of and , by an appropriate version of the Wiener–Khinchin theorem. In both Wiener’s application and our application, the component signals and are uncorrelated, in the sense that their cross-covariance is identically zero. In this case, and . Thus, the transfer function of the Wiener filter at frequency is the fraction of the power of at that frequency attributable to the component signal .
In practice, we sample the signals and at discrete intervals of . We estimate their spectral densities using the Welch method, averaging the estimates on overlapping segments of length samples, using the discrete Fourier transform (DFT) and the Hann window on each segment. We then apply the inverse DFT to the resulting discrete estimate of , yielding an impulse response vector of length . We convolve with the sampled signal to compute the Wiener filter.