We demonstrate the utility of machine learning in the separation of superimposed chaotic signals using a technique called reservoir computing. We assume no knowledge of the dynamical equations that produce the signals and require only training data consisting of finite-time samples of the component signals. We test our method on signals that are formed as linear combinations of signals from two Lorenz systems with different parameters. Comparing our nonlinear method with the optimal linear solution to the separation problem, the Wiener filter, we find that our method significantly outperforms the Wiener filter in all the scenarios we study. Furthermore, this difference is particularly striking when the component signals have similar frequency spectra. Indeed, our method works well when the component frequency spectra are indistinguishable—a case where a Wiener filter performs essentially no separation.

Separation of a signal composed of a linear superposition of independent signals into component parts, without knowledge of the underlying systems that generate them, is an important problem with wide applications in signal processing and data analysis. Machine learning has been successful in various applications involving temporal signals through the use of recurrent neural networks, though these networks are generally more difficult to train than the feedforward networks used in deep learning. In this article, we present a method for separating signals using a machine learning paradigm called reservoir computing, in which the training is relatively simple. We test our method on chaotic signals, whose broad frequency spectrum presents a challenge for classical separation methods. Our results suggest that the method is effective and widely applicable to signal extraction in complex geological, biological, and economic systems that are difficult to model.

The problem of extracting a signal from ambient noise has had wide applications in various fields such as signal processing, weather analysis,1 medical imaging,2 and cryptography.3 We consider the related (but potentially more difficult) problem of separating two or more similar signals that are linearly combined. This is a version of the cocktail party problem, i.e., how do people at a cocktail party separate out and focus on the voice of the person they are interested in listening to from a combination of several voices that reach their ear. In our version of the problem, signals are generated by chaotic processes. If the equations governing these processes are known, the problem can be attacked, for example, by using chaos synchronization.4–7 We consider instead an approach that relies only on data. Our problem is similar to that of blind source separation for chaotic signals,8–10 but the latter problem typically assumes that multiple linear combinations of the signals are being measured, with the number of independent measurements at least as large as the number of signals. In contrast, our method requires only one linear combination of the signals to be measured after training is complete. For training, we assume that we have finite-time samples of the separate component signals, and our method learns from these samples to separate subsequently measured combination signals into their components. We also note that Wiener obtained an optimal linear solution for the signal separation problem (the “Wiener filter”). In contrast, the technique we use is fundamentally nonlinear and, as we will show, can significantly outperform the linear technique.

Machine learning techniques have been very successful in a variety of tasks such as image classification, video prediction, voice detection, etc.11–13 Recent work in speech separation includes supervised machine learning techniques such as deep learning,14 support vector machines,15 as well as unsupervised methods such as non-negative matrix factorization.16 

Our approach is based on a recurrent machine learning architecture originally known as echo state networks (originally proposed in the field of machine learning)17 and liquid state machines (originally proposed in the field of computational neuroscience),18 but now commonly referred to as reservoir computing.19 Reservoir computing has been applied to several real-world problems such as prediction of chaotic signals,20 time-series analysis,21 similarity learning,22 electrocardiogram classification,23 short-term weather forecasting,24 etc. Although our demonstrations in this paper are for reservoir computing, we expect similar results could be obtained with other types of machine learning using recurrent neural network architectures, such as long short-term memory (LSTM) and gated recurrent units;25 however, based on the results in Ref. 25, we expect that these architectures will have a significantly higher computational cost for training than reservoir computing in the cases we consider.

We train a Reservoir Computer (RC) as follows. Training data consist of an input time series and a desired output time series. The RC consists of the input layer, the reservoir, and the output layer. The input time series is processed by the input layer as well as the reservoir; the resulting reservoir states are recorded as a vector time series. Then, linear regression is used to find an output weight matrix (the output layer) that fits the reservoir state to the desired output. The internal parameters of the input layer and the reservoir are not adjusted to the training data, only the output weights are trained.

In this article, we input a linear combination of two chaotic signals to the RC and train it to output an estimate of one of the signals. In the simplest case, which we describe in Sec. III, the ratio between the amplitudes of the signals is known in advance. In Sec. IV, we consider the case in which this ratio is unknown. In this case, we first train a RC to estimate this ratio which we call the mixing fraction and then train another RC to separate the chaotic signals given the ratio. We demonstrate our results on signals from the Lorenz system in the chaotic regime.

We describe our implementation of reservoir computing in Sec. II A and review the Lorenz system in Sec. II B. We compare our results with an approximation to the Wiener filter, computed by estimating the power spectra of the signals using the same training data we use for the RC. If computed from the exact spectra, the Wiener filter is the optimal linear separator for uncorrelated signals (see the  Appendix). Motivated in part by this comparison, we consider three scenarios is Sec. III: separating signals with different evolution speeds on the same attractor, signals with different Lorenz parameters, and signals with both the parameters and evolution speed perturbed in such a way that the spectra of the signals almost match. We present our conclusions in Sec. V, a brief summary of which are as follows: (1) the RC is a robust and computationally inexpensive chaotic signal separation tool that outperforms the Wiener filter, and (2) we use a dual-reservoir computer mechanism to enable separation even when the amplitude ratio of the component signals in the mixed signal is unknown. Here, the first RC estimates a constant parameter, the mixing fraction (or amplitude ratio), whereas the second RC separates signals given an estimated mixing fraction.

We consider the problem of estimating two scalar signals s1(t) and s2(t) from their weighted sum u(t)=β1s1(t)+β2s2(t). We normalize s1(t), s2(t), and u(t) to have mean 0 and variance 1. We assume s1(t) and s2(t) to be uncorrelated, in which case our normalization implies, β12+β22=1. Let α=β12, so that

(1)

We call α the mixing fraction; more precisely, it is the ratio of the variance of the first component of u(t) to the variance of u(t).

In Sec. III, we assume that the value of α is known. In Sec. IV, we consider the case where α is unknown. In both sections, we assume that limited-time samples of s1(t) and s2(t) are available, say, for 0tT, and we use these samples to train the reservoir. Our methods require no knowledge of the processes that generate s1(t) and s2(t), but we assume that these processes are stationary enough that the training samples are representative of the components of future instances of u(t). For our numerical experiments, we generate s1(t) and s2(t) from the Lorenz family of chaotic dynamical systems (see Sec. II B). More generally, the same method can be used if s2(t) is a combination of multiple signals and/or noise.

There are many variations in implementation; in this paper, we adopt the echo state network approach of reservoir computing proposed by Jaeger.17 The reservoir computer has three components (Fig. 1), a linear input layer with Mi scalar input [one for each component of the M-dimensional input signal u(t)], a recurrent, nonlinear reservoir network with N dynamical reservoir nodes driven both by inputs as well as by delayed feedbacks from the other reservoir nodes, and a linear output layer with Mo scalar outputs, as shown in Fig. 1. We describe a method for general Mi and Mo, but in our experiments, we will always use Mi=Mo=1, with s(t) equal to either s1(t) or s2(t), or (in Sec. IV), the constant α.

FIG. 1.

Reservoir architecture with input state at time t denoted by u(t), reservoir state by r(t), and output state by s^(t). The output layer is trained so that s^(t) approximates the desired output signal s(t). Adapted from Lu et al., Chaos 27(4), 041102 (2017). Copyright 2017 AIP Publishing LLC.31 

FIG. 1.

Reservoir architecture with input state at time t denoted by u(t), reservoir state by r(t), and output state by s^(t). The output layer is trained so that s^(t) approximates the desired output signal s(t). Adapted from Lu et al., Chaos 27(4), 041102 (2017). Copyright 2017 AIP Publishing LLC.31 

Close modal

1. Input layer

The input layer is described by a N×Mi matrix Win where elements are randomly chosen between to be a uniform distribution between [k,k], where k is a hyper-parameter to be chosen later.

2. Reservoir layer

The reservoir can be thought of as a dynamical system with state vector r(t) at time t given by

(2)

The state of the reservoir at time t is given by r(t), of size N. The notation tanh() with a vector argument is defined as the vector whose components are the hyperbolic tangents of the corresponding components of the argument vector. The leakage parameter a, which is bounded between [0,1], determines the speed at which the input affects or leaks into the reservoir. Both a and the bias magnitude b are hyper-parameters. The recurrent connection weights WresRN×N are initialized randomly between 1 and 1; then, Wres is normalized by a multiplication of all its components by a constant chosen so that the spectral radius (maximal absolute eigenvalue of Wres) λ, which is another hyper-parameter. A discussion of the effect of spectral radius on performance is presented in Ref. 26. Typically, N is much larger than Mi so that the reservoir transforms the input from the input space into a much higher dimensional reservoir space. The sparsity sp of the reservoir layer is a measure of how sparsely connected the nodes are. Sparsity of zero means all-to-all coupling, and sparsity of one means no connections at all.

3. Output layer

After running the reservoir [Eq. (3)] for a transient time period 100 to 0, we form the reservoir state trajectory, R, is formed by concatenating the reservoir state vectors (the state of all reservoir nodes) at every time step r(t) corresponding to the input u(t) as follows:

(3)

Thus, R is an augmented matrix of size N×T where T is the training time, i.e., number of time steps for which training data is available. During training, the output weights are found by mapping the reservoir state trajectory to the desired output layer representation S=[s(1),s(2),,s(T)] over T samples. Only the weight matrix Wout is optimized during training such that the mean square error E(s^) between the output of the reservoir and the target signal s is minimized. The reservoir output s^(t) is obtained through the output weights as follows:

(4)

WoutRMo×N where Mo is the dimensionality of the readout layer.

Ridge Regression (or Tikhonov regularization)27 is used for fitting, thus making the system robust to overfitting and noise. Ridge regression minimizes squared error while regularizing the Euclidian norm of the weights as follows:

(5)

where δ is a regularization constant, which is also a hyper-parameter and is chosen to be 106 except as noted below.

Once training is done, the RC predicts state variables for times tT through Wout and Eq. (4).

Our examples are based on the Lorenz equations,28 

(6)
(7)
(8)

The Lorenz attractor is a strange attractor that arises in these system of equations. The Lorenz system is known to be chaotic for the parameter values: σ=10, ρ=28, and β=8/329 and appears to be chaotic for other parameter values we use in this article. We generate trajectories using the fourth order Runge-Kutta method with step size 0.01, and we sample them every five steps, i.e., 0.05 time units.

Recall that we assume finite-time samples of the component signals s1(t) and s2(t) are available for training. In this section, we assume the mixing fraction α to be known a priori. We form the training input signal u(t) according to Eq. (1) using the known value of α. We set the desired output signal s(t) equal to s1(t).

We calculate the error between the trained reservoir output s^(t) and the desired signal s1(t) as follows. The mean square reservoir error ER is

(9)

where s1[x1,y1,z1], i.e., one of the components of the signal (typically we use s1=x1). ζ scales the input so that the denominator indicates the root mean square error in the absence of any processing by the reservoir.

The Wiener filter30 is known to be the best possible linear solution to our signal separation problem and is also commonly used in tasks such as signal denoising, image deblurring, etc. A detailed explanation of the Wiener filter is given in the  Appendix. In this article, we always compare the RC with a Wiener filter, which uses a window of 500 for estimating the spectrum. The mean square Wiener error measure EW is

(10)

where s^w is the output of the Wiener filter.

In this section, the reservoir input consists of a combination of two x component signals from Lorenz systems with different parameter values. The signal x1 has parameters p1={σ=10,ρ=28,β=8/3}. The signal x2 has parameters p2=1.20×p1.

Several parameters related to the reservoir system must be chosen for running experiments. These parameters include reservoir size N, spectral radius λ, leakage parameter a, and the length of training signal. In Fig. 2, we vary each of these parameters individually while keeping the others constant in order to identify an appropriate set of values.

FIG. 2.

Reservoir error ER and Wiener error EW in separating x1 over a test length of 5000 time steps for (a) varying reservoir size N, (b) spectral radius λ, (c) training length, and (d) leakage parameter a. All parameter values except the parameter being varying: Δt=0.01, N=2000,λ=0.9,a=0.3,k=0.13, training length =50000,α=0.5, and p2=1.2p1.

FIG. 2.

Reservoir error ER and Wiener error EW in separating x1 over a test length of 5000 time steps for (a) varying reservoir size N, (b) spectral radius λ, (c) training length, and (d) leakage parameter a. All parameter values except the parameter being varying: Δt=0.01, N=2000,λ=0.9,a=0.3,k=0.13, training length =50000,α=0.5, and p2=1.2p1.

Close modal

Figure 2 shows the performance of the reservoir in separating the x component of the s1 trajectory for varying parameters given α=1/2. The reservoir errors are averaged over 10 random initializations of the reservoir. The error bars denote standard error, i.e., the standard deviation across l random initializations over l. We observe a downward trend in error with increasing reservoir size, which seems to saturate at about 2000 nodes. Panel (a) shows that a reservoir size of N=2000 gives a reasonable trade-off between performance and computational efficiency. Panel (b) shows that λ=0.9 seems to result in the best separation, with all other parameters constant. As seen in panels (c) and (d), the optimal leakage parameter a and training length are 0.3 and 50000, respectively. Other parameters chosen are input strength normalization k=0.13 such that the input does not saturate the tanh curve. These parameter values are used for the rest of this paper unless mentioned otherwise.

Figure 3(a) present the performance of the reservoir as a function of the mixing fraction. We observe that the reservoir computer consistently outperforms the Wiener filter. However, as α increases, the normalized error ER increases as well. This can be attributed to the denominator in ER [see (9)] tending to zero as α tends to one. The non-normalized error is plotted in Fig. 3(b), and here we can see that as α tends to one, error tends to zero, as it should. Figure 3(c) shows the estimated separated x1 trajectory along with the actual x1 in the testing phase, and the actual x1 and x2 trajectories. The top panel demonstrates that the RC prediction does indeed match the actual x1 accurately. The middle panel demonstrates that the best linear filter, the Wiener filter, is comparatively worse than the RC at estimating the chaotic signal x1.

FIG. 3.

(a) Reservoir (blue) and Wiener (orange) error over a test length of 5000 time steps across mixing fraction α for two Lorenz signals with the same speed but parameters p1=1.2×p2. (b) shows the numerator only of the error measures ER and EW. (c) For a mixing fraction α=0.5, (top) time-series plot of actual x1 and reservoir-predicted output s^, (middle) time-series plot of actual x1 and Wiener-predicted output s^w, and (bottom) actual x1 and x2. ER=0.15 and EW=1.14. Δt=0.01, N=2000,λ=0.9,a=0.3,k=0.13, training length =50000, and for (c), α=0.5.

FIG. 3.

(a) Reservoir (blue) and Wiener (orange) error over a test length of 5000 time steps across mixing fraction α for two Lorenz signals with the same speed but parameters p1=1.2×p2. (b) shows the numerator only of the error measures ER and EW. (c) For a mixing fraction α=0.5, (top) time-series plot of actual x1 and reservoir-predicted output s^, (middle) time-series plot of actual x1 and Wiener-predicted output s^w, and (bottom) actual x1 and x2. ER=0.15 and EW=1.14. Δt=0.01, N=2000,λ=0.9,a=0.3,k=0.13, training length =50000, and for (c), α=0.5.

Close modal

In this section, the reservoir input consists of a combination of two signals from the x component of the Lorenz system with the same parameter values p2=p1, but with different speeds, i.e., the right hand side of Eq. (8) for the Lorenz system corresponding to x1 is multiplied by a speed fraction (ratio of speeds) η. The reservoir parameters remain the same as those found in Sec. III A.

Figures 4(a) and 4(b) present the performance of the reservoir computer as a function of the mixing fraction for η1=1.2η2, respectively, where ηi is the speed of the ith signal. The RC consistently outperforms the Wiener filter. As seen in Fig. 4(c), the Wiener prediction is much poorer than the reservoir prediction of the chaotic signal x1.

FIG. 4.

(a) Reservoir (blue) and Wiener (orange) error in separating x1 over a test length of 5000 time steps across mixing fraction α for two Lorenz signals with the same parameters but speeds η1=1.2×η2. (b) shows the numerator only of the error measures ER and EW. (c) For a mixing fraction α=0.5, (top) time-series plot of actual x1 and reservoir-predicted output s^, (middle) time-series plot of actual x1 and Wiener-predicted output s^w, and (bottom) actual x1 and x2. ER=0.51,EW=1.02. Δt=0.01, N=2000,λ=0.9,a=0.3,k=0.13, training length=50000, and for (c), α=0.5.

FIG. 4.

(a) Reservoir (blue) and Wiener (orange) error in separating x1 over a test length of 5000 time steps across mixing fraction α for two Lorenz signals with the same parameters but speeds η1=1.2×η2. (b) shows the numerator only of the error measures ER and EW. (c) For a mixing fraction α=0.5, (top) time-series plot of actual x1 and reservoir-predicted output s^, (middle) time-series plot of actual x1 and Wiener-predicted output s^w, and (bottom) actual x1 and x2. ER=0.51,EW=1.02. Δt=0.01, N=2000,λ=0.9,a=0.3,k=0.13, training length=50000, and for (c), α=0.5.

Close modal

We have also investigated the case of identical parameter systems, p2=p1 with identical speeds. We find that when the mixing fraction is α=0.5, the symmetry between the components αs1 and 1αs2 prevents separation. However, as shown in Fig. 5, as α deviates from 0.5, the RC succeeds in reducing the normalized error significantly below 1.

FIG. 5.

(a) Reservoir (blue) and Wiener (orange) error in separating x1 over a test length of 5000 time steps across mixing fraction α for two Lorenz signals coming from identical systems, i.e., with the same parameters and speeds. (b) shows the numerator only of the error measures ER and EW. (c) For a mixing fraction α=0.3, (top) time-series plot of actual x1 and reservoir-predicted output s^, (middle) time-series plot of actual x1 and Wiener-predicted output s^w, and (bottom) actual x1 and x2. In (a) and (b), for α=0.5 only, regularization constant was changed to δ=103 to prevent overfitting. For all other α’s, δ=106 as usual. ER=0.50,EW=1.04. Δt=0.01, N=2000,λ=0.9,a=0.3,k=0.13, training length=50000, and for (c), α=0.3.

FIG. 5.

(a) Reservoir (blue) and Wiener (orange) error in separating x1 over a test length of 5000 time steps across mixing fraction α for two Lorenz signals coming from identical systems, i.e., with the same parameters and speeds. (b) shows the numerator only of the error measures ER and EW. (c) For a mixing fraction α=0.3, (top) time-series plot of actual x1 and reservoir-predicted output s^, (middle) time-series plot of actual x1 and Wiener-predicted output s^w, and (bottom) actual x1 and x2. In (a) and (b), for α=0.5 only, regularization constant was changed to δ=103 to prevent overfitting. For all other α’s, δ=106 as usual. ER=0.50,EW=1.04. Δt=0.01, N=2000,λ=0.9,a=0.3,k=0.13, training length=50000, and for (c), α=0.3.

Close modal

Linear signal denoising/separation methods are based on difference in spectra. Here we study the z component of two signals, z1 and z2 with p2=1.1×p1 and η2=0.9×η1, where p1={σ=10,ρ=28,β=8/3}, and η1=1. Figure 6(a) shows a log plot of the Power Spectral Density (PSD) ϕz1z1 and ϕz2z2 of the two signals z1 and z2, respectively, as a function of frequency Ω. The PSD of the z component of the Lorenz system has a distinct peak (unlike the x,y components), followed by a much smaller peak. For this reason, in Fig. 6(a), we plot the PSD of the z signals to demonstrate spectra matching even though we separate the corresponding x signals (which may conceivably be a harder problem to solve since the x switches between positive and negative sides chaotically, and the RC has to learn when to switch correctly). We observe, in panel (a), that the peaks do indeed line up and the spectra match fairly well. Figure 6(b) plots the reservoir and Wiener errors as a function of α for the spectra matched case. For a case where the spectra of the two individuals are indistinguishable, a spectra-based filter such as the Wiener filter will be unable to separate the signals. Figures 6(c) and 6(d) show the reservoir predictions for the x and z component of the Lorenz system, respectively, for α=0.5. We observe that the RC is indeed able to separate the signals, even when their spectra match, unlike other state-of-the-art spectra-based signal separation methods like the Wiener filter (see the  Appendix).

FIG. 6.

(a) Power spectral density of z1 and z2 across frequency Ω, on a scaled log plot to demonstrate a clear match in PSD, respectively. (b) shows reservoir and Wiener error (ER and EW) for separation of spectra matched Lorenz x signals as a function of α. (c) and (d) For a mixing fraction α=0.5, (top) time-series plot of actual x1 and z1, respectively, and reservoir-predicted outputs, and (bottom) actual inputs x1,x2 and z1,z2, respectively. For (c), ER=0.48 and EW=1.02, and for (d), ER=0.26 and EW=1.14. Δt=0.01, N=2000,λ=0.9,a=0.3,k=0.13 training length=50000,α=0.5,η1=1.1×η2,andp2=1.1×p1.

FIG. 6.

(a) Power spectral density of z1 and z2 across frequency Ω, on a scaled log plot to demonstrate a clear match in PSD, respectively. (b) shows reservoir and Wiener error (ER and EW) for separation of spectra matched Lorenz x signals as a function of α. (c) and (d) For a mixing fraction α=0.5, (top) time-series plot of actual x1 and z1, respectively, and reservoir-predicted outputs, and (bottom) actual inputs x1,x2 and z1,z2, respectively. For (c), ER=0.48 and EW=1.02, and for (d), ER=0.26 and EW=1.14. Δt=0.01, N=2000,λ=0.9,a=0.3,k=0.13 training length=50000,α=0.5,η1=1.1×η2,andp2=1.1×p1.

Close modal

Often, as in the case of the cocktail party problem, the ratio of amplitudes in a mixed signal may not be known. In this section, we describe a methodology to separate signals without knowledge of the mixing fraction α. As before, we assume that we have access to training data for the individual signals s1(t) and s2(t). The simplest approach would be to form training data consisting of mixed signals [Eq. (1)] with a variety of α values between 0 and 1, and train an RC to output s1(t); the output weight matrix Wout is then independent of α, unlike in Sec. III. However, we found this approach to be unsuccessful.

Instead, we use two RCs in sequence as follows. The first RC is trained as described in the previous paragraph, except that the desired output is α instead of s1(t). Once trained, we use the first RC to estimate the value of α for a given mixed signal, as we describe in Sec. IV A. Then, we input the same mixed signal to a second RC, which is trained as in Sec. III to output s1(t). The second RC can be trained after the first RC has run, using the estimated value of α. Or, the output weight matrix for the second RC can be interpolated from output weight matrices that have been pretrained on discrete values of α, as we discuss in Sec. IV B.

In training, the first RC is given mixed signals over a range of discrete values of α (0–1 in intervals of 0.05) and trained to output a constant α value. In testing, the constant predicted α’s are averaged over the testing time. We find that the trained RC always has a tendency to predict α closer to mean α (0.5) when fit on both the training and the test data, i.e., overestimate small actual α and underestimate large actual α. To correct for this tendency, we use the training dataset to construct a mapping from the predicted α to the desired α via a fit to a third-order polynomial function. We then apply this same function to the reservoir-predicted α value(s) for the test data, in order to obtain corrected values.

Figure 7 illustrates the performance of our approach for separating a mixed signal of x-components of two Lorenz systems with parameter values that differ by 20%. Figure 7 (top) shows the reservoir-estimated α vs the actual α and Fig. 7 (bottom) shows the corrected α estimate [calculated using a third-order polynomial fit to the points in 7 (top)]. After making the correction, our method accurately predicts α for both the training and test data. In order to separate signals with a certain value of mixing fraction, a reservoir needs to be trained explicitly for separation with that mixing fraction (within limits of interpolation discussed in Sec. IV B). Thus, obtaining an estimate of the mixing fraction from the first RC allows us to train a second RC for separation of signals with that mixing fraction.

FIG. 7.

(Top) Plot of the actual α used in the mixed signal vs the reservoir-predicted α for both the training (orange) and test (blue) dataset. The green curve is the third degree polynomial fitting function between 0 and 1. The black dashed line represents the diagonal for reference. (Bottom) The corrected test and train RC-estimated mixing fractions (after fitting to the third degree polynomial). The Lorenz signals being separated have the following characteristics: p2=1.2p1 and η1=η2. Here, Δt=0.01, sparsity sp=0.99, N=1000,λ=0.9,a=0.3,andk=0.13 such that input signal has unit standard deviation, training, and testing length=50000 for each training α=[0,0.1,0.2,,0.9,1].

FIG. 7.

(Top) Plot of the actual α used in the mixed signal vs the reservoir-predicted α for both the training (orange) and test (blue) dataset. The green curve is the third degree polynomial fitting function between 0 and 1. The black dashed line represents the diagonal for reference. (Bottom) The corrected test and train RC-estimated mixing fractions (after fitting to the third degree polynomial). The Lorenz signals being separated have the following characteristics: p2=1.2p1 and η1=η2. Here, Δt=0.01, sparsity sp=0.99, N=1000,λ=0.9,a=0.3,andk=0.13 such that input signal has unit standard deviation, training, and testing length=50000 for each training α=[0,0.1,0.2,,0.9,1].

Close modal

We note also that similar to this method for estimating the mixing fraction parameter, RC along with output correction can be used for parameter estimation from data in other systems as well. It is possible also that other choices for the reservoir dynamics, such as a Liquid State Machine,18 would be fruitful for parameter estimation.

As mentioned above, having estimated the value of α, we can subsequently train the second RC using the estimated α. However, if many different values of α will be encountered, retraining for each value might be unsatisfactorily inefficient. In such cases, RCs can be pretrained on discrete values of α. Any intermediate predicted α estimate obtained can then be used for separation by interpolating between the two nearest trained α RCs. Here, interpolating between RCs is equivalent to interpolating between their trained output weights. What spacing of α is appropriate for training? A mixed signal with predicted α=q can be separated by using the output weight matrix Wqout if q is in between two discrete trained values of α (q and q+),

(11)

In Fig. 8(a), we plot the reservoir error ER for α=q=0.5 in orange, and for the average of the Wout matrices of reservoirs trained on α±Δα/2. We notice that interpolating between discrete RCs with a spacing of Δα=0.1 is well within the range of negligible error compared to training on individual predicted αs. Hence, one only needs to train on α’s in intervals of 0.1 (α=[0,0.1,0.2,,0.9,1.0]). Figure 8(b) shows the reservoir error for RCs trained individually on α=[0.40,0.41,0.420.49,0.50] compared with the reservoir error obtained by averaging the Wout matrices of RCs trained on α=0.4,0.5 appropriately as in Eq. (11). We observe that the results practically coincide and that the method is fairly robust to errors in α prediction.

FIG. 8.

Reservoir errors on separating Lorenz signals with α=0.5. (Red) RC trained on α=0.5, (blue) average of RCs trained on α=0.5±Δα/2. (b) Errors in interpolation between predictions by independent RCs trained on α=[0.4,0.5] (blue) and RC trained on the exact 0.4<α<0.5 (orange); training length =50000, training length=5000. p2=1.2p1,η2=η1, Δt=0.01, sparsity sp=0.95, N=1000,λ=0.9,a=0.3,andk=0.13.

FIG. 8.

Reservoir errors on separating Lorenz signals with α=0.5. (Red) RC trained on α=0.5, (blue) average of RCs trained on α=0.5±Δα/2. (b) Errors in interpolation between predictions by independent RCs trained on α=[0.4,0.5] (blue) and RC trained on the exact 0.4<α<0.5 (orange); training length =50000, training length=5000. p2=1.2p1,η2=η1, Δt=0.01, sparsity sp=0.95, N=1000,λ=0.9,a=0.3,andk=0.13.

Close modal

Successful interpolation between RCs trained on discrete mixing fractions drastically reduces the training time and computation without compromising on quality of signal separation. In fact, we only need to train on 11 distinct mixing fractions to be able to separate Lorenz signals mixed in any proportion. Thus, we are able to generalize our chaotic signal separation technique to cases where the mixing fraction is unknown.

In this article, we used a type of machine learning called reservoir computing for separation of chaotic signals. We demonstrated the ability to separate signals for several cases where the two signals are obtained from Lorenz systems with different parameters. We compared our results with the Wiener filter, which is the optimal linear filter and whose coefficients can be computed from the power spectra of the signals (see the  Appendix). Spectra-based methods, naturally, perform poorly at signal separation if the spectra of the signals to be separated are indistinguishable. By contrast, the RC performs reasonably well even when the two signals that are mixed have very similar spectra. Our results were significantly better than the Wiener filter calibrated from the same training data for all the scenarios we considered.

Often, in signal separation applications such as the cocktail party problem, the mixing fraction (amplitude ratio) of the signals to be separated is unknown. We introduce a RC-based way to separate signals with an unknown parameter, in our case the mixing fraction. The first step of this generalized method is estimating a mixing fraction for a given signal. Estimation of a constant valued parameter from a temporal signal is a problem of broad interest, with applications such as weather prediction, predicting parameters of flow, equation modeling, etc. We find that after time-averaging its trained output, the RC tends to skew the estimated parameter toward the mean of the parameter values used during training. By fitting a mapping function that corrects the averaged output for the training data, we are also able to approximately correct the test output. This method of introducing an additional nonlinear “correction” to the learned output weights may be useful for predicting constant outputs in other dynamic machine learning systems.

In some cases, training on a wide range of mixing fractions may not be possible due to the need for quick separation of signals and limited training capacity (e.g., in hardware applications). Hence, we study the robustness of the RC and the ability to use interpolated RCs pretrained at discrete mixing fractions. We demonstrate that the RCs need only be trained at a coarse grid of mixing fractions in order to accurately separate Lorenz signals with arbitrary mixing fractions. Our results are robust to errors in the prediction of mixing fraction. Hence, in situations where computational resources are constrained, RCs can be trained on discrete mixing fractions, and interpolation between these RCs can be used to accurately separate chaotic signals for intermediate values of mixing fraction.

Here, we have demonstrated the ability of reservoir computing to act as an efficient and robust method for separation of chaotic signals. The dynamical properties of the reservoir make it a prime candidate for further exploration of chaotic signal processing. An interesting future direction might be to study alternate RC architectures such as parallel RCs for more complex signal extraction problems.

This research was supported through a DoD contract under the Laboratory of Telecommunication Sciences Partnership with the University of Maryland and, in part, by National Science Foundation (NSF) (Award No. DGE-1632976).

The Wiener filter30 was designed to separate a signal from noise, but it can be used to (imperfectly) separate any two signals with different power spectra. For simplicity, we formulate the filter here in continuous time, as a linear noncausal filter with infinite impulse response (IIR) for a scalar signal. Later, we describe how we compute the filter in practice, in discrete time with finite impulse response.

Let u(t) be the combined signal—the input to the filter—and let s(t) be the component signal, that is, the desired output of the filter.

A noncausal IIR filter can be written as a convolution

(A1)

where h(τ) is the impulse response function of the filter [we assume that |h(τ)| has finite integral; then, if u(t) is bounded, so is hu(t)]. Of all such filters, the Wiener filter is the one that minimizes the mean square error (hu(t)s(t))2 between the filter output and the desired output.

Similar to linear least squares in finite dimensions, the minimizing function hW can be related to the autocovariance and cross-covariance functions

(A2)

Specifically, setting the first variation (i.e., the first derivative in the calculus of variations) of the mean-square error equal to zero yields

(A3)

Equation (A3) can be solved in the frequency domain, where convolution becomes multiplication. Let HW, Puu, and Pus be the Fourier transforms of h, Cuu, and Cus, respectively; then, H(ω)Puu(ω)=Pus(ω), and

(A4)

We interpret Eq. (A4) based on the fact that Puu(ω) and Pus(ω) are, respectively, the power spectral density of u(t) and the cross spectral density of s(t) and u(t), by an appropriate version of the Wiener–Khinchin theorem. In both Wiener’s application and our application, the component signals s(t) and s=u(t)s(t) are uncorrelated, in the sense that their cross-covariance is identically zero. In this case, Cus(τ)=Cuu(τ) and Pus(ω)=Puu(ω). Thus, the transfer function HW(ω) of the Wiener filter at frequency ω is the fraction of the power of u(t) at that frequency attributable to the component signal s(t).

In practice, we sample the signals s(t) and u(t) at discrete intervals of Δt=0.01. We estimate their spectral densities using the Welch method, averaging the estimates on overlapping segments of length 500 samples, using the discrete Fourier transform (DFT) and the Hann window on each segment. We then apply the inverse DFT to the resulting discrete estimate of HW, yielding an impulse response vector hw of length 500. We convolve hw with the sampled signal to compute the Wiener filter.

1.
J.
Basak
,
A.
Sudarshan
,
D.
Trivedi
, and
M. S.
Santhanam
, “
Weather data mining using independent component analysis
,”
J. Mach. Learn. Res.
5
(
Mar
),
239
253
(
2004
).
2.
M.
Congedo
,
C.
Gouy-Pailler
, and
C.
Jutten
, “
On the blind source separation of human electroencephalogram by approximate joint diagonalization of second order statistics
,”
Clin. Neurophysiol.
119
(
12
),
2677
2686
(
2008
).
3.
N.
Doukas
and
N. V.
Karadimas
, “
A blind source separation based cryptography scheme for mobile military communication applications
,”
WSEAS Trans. Commun.
7
(
12
),
1235
1245
(
2008
).
4.
A.
Buscarino
,
L.
Fortuna
, and
M.
Frasca
, “
Separation and synchronization of chaotic signals by optimization
,”
Phys. Rev. E
75
,
016215
(
2007
).
5.
L. S.
Tsimring
and
M. M.
Sushchik
, “
Multiplexing chaotic signals using synchronization
,”
Phys. Lett. A
213
(
3–4
),
155
166
(
1996
).
6.
T. L.
Carroll
and
L. M.
Pecora
, “
Using multiple attractor chaotic systems for communication
,”
Chaos
9
(
2
),
445
451
(
1999
).
7.
P.
Arena
,
A.
Buscarino
,
L.
Fortuna
, and
M.
Frasca
, “
Separation and synchronization of piecewise linear chaotic systems
,”
Phys. Rev. E
74
,
026212
(
2006
).
8.
Y.
Jianning
and
F.
Yi
, “Blind separation of mixing chaotic signals based on ICA using kurtosis,” in 2012 International Conference on Computer Science and Service System (IEEE, 2012), pp. 903–905.
9.
M.
Kuraya
,
A.
Uchida
,
S.
Yoshimori
, and
K.
Umeno
, “
Blind source separation of chaotic laser signals by independent component analysis
,”
Opt. Express
16
(
2
),
725
730
(
2008
).
10.
L.
Shan-Xiang
,
W.
Zhao-Shan
,
H.
Zhi-Hui
, and
F.
Jiu-Chao
, “
Gradient method for blind chaotic signal separation based on proliferation exponent
,”
Chin. Phys. B
23
(
1
),
010506
(
2013
).
11.
A.
Krizhevsky
,
I.
Sutskever
, and
G. E.
Hinton
, “Imagenet classification with deep convolutional neural networks,” in Proceedings of the 25th International Conference on Neural Information Processing Systems (NIPS’12) (Curran Associates, Inc., 2012), Vol. 1, pp. 1097–1105.
12.
S.
Leglaive
,
R.
Hennequin
, and
R.
Badeau
, “Singing voice detection with deep recurrent neural networks,” in 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (IEEE, 2015), pp. 121–125.
13.
J.
Kim
,
M.
El-Khamy
, and
J.
Lee
, “Residual LSTM: Design of a deep recurrent architecture for distant speech recognition,” arXiv:1701.03360 (2017).
14.
D.
Wang
and
J.
Chen
, “
Supervised speech separation based on deep learning: An overview
,”
IEEE/ACM Trans. Audio Speech Lang. Process.
26
(
10
),
1702
1726
(
2018
).
15.
K.
Han
and
D.
Wang
, “
A classification based approach to speech segregation
,”
J. Acoust. Soc. Am.
132
(
5
),
3475
3483
(
2012
).
16.
M. N.
Schmidt
and
R. K.
Olsson
, “Single-channel speech separation using sparse non-negative matrix factorization,” in Ninth International Conference on Spoken Language Processing (ISCA, 2006).
17.
H.
Jaeger
, “The echo state approach to analysing and training recurrent neural networks-with an erratum note,” German National Research Center for Information Technology GMD Technical Report, Bonn, Germany, 2001, Vol. 148, p. 13.
18.
W.
Maass
,
T.
Natschläger
, and
H.
Markram
, “
Real-time computing without stable states: A new framework for neural computation based on perturbations
,”
Neural Comput.
14
(
11
),
2531
(
2002
).
19.
M.
Lukoševičius
and
H.
Jaeger
, “
Reservoir computing approaches to recurrent neural network training
,”
Comput. Sci. Rev.
3
(
3
),
127
149
(
2009
).
20.
J.
Pathak
,
B.
Hunt
,
M.
Girvan
,
Z.
Lu
, and
E.
Ott
, “
Model-free prediction of large spatiotemporally chaotic systems from data: A reservoir computing approach
,”
Phys. Rev. Lett.
120
(
2
),
024102
(
2018
).
21.
J. B.
Butcher
,
D.
Verstraeten
,
B.
Schrauwen
,
C. R.
Day
, and
P. W.
Haycock
, “
Reservoir computing and extreme learning machines for non-linear time-series data analysis
,”
Neural Netw.
38
,
76
89
(
2013
).
22.
S.
Krishnagopal
,
Y.
Aloimonos
, and
M.
Girvan
, “
Similarity learning and generalization with limited data: A reservoir computing approach
,”
Complexity
2018
,
1
(
2018
).
23.
M. A.
Escalona-Morán
,
M. C.
Soriano
,
I.
Fischer
, and
C. R.
Mirasso
, “
Electrocardiogram classification using reservoir computing with logistic regression
,”
IEEE J. Biomed. Health Inform.
19
(
3
),
892
898
(
2014
).
24.
A. A.
Ferreira
,
T. B.
Ludermir
,
R. R. B.
de Aquino
,
M. M. S.
Lira
, and
O. N.
Neto
, “Investigating the use of reservoir computing for forecasting the hourly wind speed in short-term,” in 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence) (IEEE, 2008), pp. 1649–1656.
25.
P. R.
Vlachas
,
W.
Byeon
,
Z. Y.
Wan
,
T. P.
Sapsis
, and
P.
Koumoutsakos
, “
Data-driven forecasting of high-dimensional chaotic systems with long short-term memory networks
,”
Proc. R. Soc. A
474
(
2213
),
20170844
(
2018
).
26.
M. C.
Ozturk
,
D.
Xu
, and
J. C.
Príncipe
, “
Analysis and design of echo state networks
,”
Neural Comput.
19
(
1
),
111
138
(
2007
).
27.
F.
Wyffels
,
B.
Schrauwen
, and
D.
Stroobandt
, “Stable output feedback in reservoir computing using ridge regression,” in Proceedings of the 18th International Conference on Artificial Neural Networks (ICANN’08) (Springer, 2008).
28.
E. N.
Lorenz
, “
Deterministic nonperiodic flow
,”
J. Atmos. Sci.
20
(
2
),
130
141
(
1963
).
29.
W.
Tucker
, “
A rigorous ODE solver and Smale’s 14th problem
,”
Found. Comput. Math.
2
(
1
),
53
117
(
2002
).
30.
N.
Wiener
, “Extrapolation, interpolation, and smoothing of stationary time series,” Extrapolation, Interpolation, and Smoothing of Stationary Time Series (Press of MIT and John Wiley & Sons, 1949).
31.
Z.
Lu
et al.,
Chaos
27
(
4
),
041102
(
2017
).