The recurrence plot and the recurrence quantification analysis (RQA) are well-established methods for the analysis of data from complex systems. They provide important insights into the nature of the dynamics, periodicity, regime changes, and many more. These methods are used in different fields of research, such as finance, engineering, life, and earth science. To use them, the data have usually to be uniformly sampled, posing difficulties in investigations that provide non-uniformly sampled data, as typical in medical data (e.g., heart-beat based measurements), paleoclimate archives (such as sediment cores or stalagmites), or astrophysics (supernova or pulsar observations). One frequently used solution is interpolation to generate uniform time series. However, this preprocessing step can introduce bias to the RQA measures, particularly those that rely on the diagonal or vertical line structure in the recurrence plot. Using prototypical model systems, we systematically analyze differences in the RQA measure average diagonal line length for data with different sampling and interpolation. For real data, we show that the course of this measure strongly depends on the choice of the sampling rate for interpolation. Furthermore, we suggest a correction scheme, which is capable of correcting the bias introduced by the prepossessing step if the interpolation ratio is an integer.
Almost all natural systems are non-linear systems, often with multiple dimensions. The analysis of the possibly rich dynamics of such systems requires advanced methods. One of these is the recurrence plot and the associated recurrence quantification analysis, which are, among other things, used to investigate the nature of the dynamics, periodicity, or to detect regime transitions. As with most other methods, this method was developed for uniformly sampled data. This, however, restricts the use of the method. Some data from, for example, astrophysics or paleoclimate cannot be analyzed straightforwardly. To circumvent this restriction, one used approach is to interpolate the data to a constant sampling rate. We show that the differences in the sampling rate together with the subsequent interpolation can lead to strong deviations, and it is not recommendable to take this approach without further consideration. We, thus, propose a correction scheme that can, with some limitations, correct these deviations.
I. INTRODUCTION
The analysis of data from complex real-world systems creates the basis for many different fields in science, such as finance, engineering, life, and earth science. For many kinds of data, standard measures, such as mean, standard deviation, or higher moments, are not sufficient to capture the details of their dynamics. For this purpose, methods from complex system science are more appropriate, such as Lyapunov exponents,1 complex networks,2 symbolic dynamics,3 or recurrence analysis.4 Recurrence analysis provides a set of tools, e.g., for studying synchronization, classifying different types of dynamics, or detecting regime transitions.5 The increasing popularity of this framework is reflected by the lively methodical development and the growing number of applications in many scientific disciplines.6
In several fields, data are only available on a non-equidistant time axis; for instance, measurements based on heartbeats are timed by the rhythm of the heart,7,8 which gives a natural varying timescale, and this leads to the necessity of interpolation when compared with other variables that form the cardiorespiratory system, such as the blood pressure.9 Time series of pulsars and rotating stars are obtained through non-equidistant observations,10,11 and paleoclimate data are hampered by the non-constant sedimentation rate, resulting in non-equidistant sampling.12 Such unevenly sampled data are a challenge for most time series analysis techniques, which usually require equidistant sampling points. One solution is to transform the time series to an equidistantly sampled one using interpolation or other techniques (e.g., transformation cost13). Other approaches try to modify the time series analysis tools to be directly applicable to uneven time series, such as Lomb–Scargle periodogram,14 kernel-based correlation,15,16 or edit distance-based recurrence analysis.17 However, interpolation is still a widely used technique, although it can cause serious bias in the results (overemphasizing the lower frequencies). Although this interpolation effect is known for several time series analysis techniques,18 it is not yet systematically considered for recurrence analysis and, thus, can lead to wrong interpretations and conclusions.
Our focus in this study is the effect of non-uniform sampling in paleoclimate time series on the results of recurrence analysis, as it has recently gained attention for its potential to address various research questions in geosciences.19–24 A prominent example is the investigation of transitions in the paleoclimate.25–29 Changes in the recurrence properties, mainly based on changes in the distribution of the diagonal line structures in recurrence plots (related to determinism or predictability of the underlying dynamics), can be used to identify regime changes. Paleoclimate data are usually retrieved from specific geological archives, e.g., marine and lake sediments, tree rings, ice cores, or stalagmites. The climate information is stored during the growth (deposition) of these archives. Since the growth or sedimentation rate can differ over time, but the sampling procedure of such archives usually uses an equidistant sampling scheme, and the final time series are usually unevenly sampled in time.12 Using interpolation without due consideration before conducting recurrence analysis can lead to bias in the distribution of diagonal line structures in the recurrence plots, finally resulting in erroneous conclusions.
In this work, we show the effects of different and changing sampling times and subsequent interpolation on the recurrence quantification analysis using different types of model systems as well as a real paleoclimate example.
We further suggest an approach to estimate quantitatively these effects and provide a correction scheme. Although we focus here on applications in paleoclimate research, the correction scheme can be analogously applied on other research questions, e.g., in astrophysics or physiology.
This work is structured as follows: In Sec. II, the recurrence plot and recurrence plot measures are introduced. In Sec. III, the used models and methods are given. Then, we present the measured effects for the simulated systems. Afterward, in Sec. IV, we present the derivation as well as the evaluation of our correction scheme. In Sec. V, a real-world example is considered. We conclude our findings in Sec. VI.
II. PRINCIPLES OF RECURRENCE PLOTS (RPs) AND RECURRENCE QUANTIFICATION ANALYSIS (RQA)
The recurrence matrix can be displayed as a plot, where all the ones ( ) are marked by points, and such an entry is, therefore, called a “recurrence point” or a “1-point.” In contrast, the points are called a “non-recurrence point” or a “0-point.” This plot is then called a recurrence plot (RP) and can be visually interpreted because it expresses rich patterns characteristic of specific dynamics.4,32
In many cases, the temporal variation of the RQA measures is of interest as they can reveal changes or transitions in the system’s dynamics, such as when the system is approaching a tipping point. To determine such changes, the sliding window approach is used. The time series is partitioned into smaller segments (windows) of a predetermined length, which may overlap with one another. Then, the RP and the RQA measures are calculated for every window separately. The distance between the start points of two consecutive windows is called the window step size, and the size of a window is the window size. It is important to determine which time point is assigned to each window. In this study, we just take the starting point of the window so that all points used to calculate the measure are in the interval after this time point.
III. INTERPOLATION EFFECT ON RECURRENCE ANALYSIS OF MODEL SYSTEMS
To demonstrate the interpolation effect, we use two model systems to generate prototypical data. We use an autoregressive model, where the course of the data is stochastically driven, and a Roessler38 system, which is a typical non-linear system, fully described by three non-linear ordinary differential equations.
To quantify the deviation in the measure due to the difference in the sampling rate and subsequent interpolation, we compare calculated from a reference series without interpolation with calculated from the interpolated series with the same temporal resolution as the reference series by taking the ratio . Ratios greater than 1 indicate an increase of the measure due to the interpolation and ratios smaller than 1 a decrease. This could then either lead to an over- or underestimation of an value when analyzing the interpolated series.
The effect depends on the considered system, the interpolation ratio, and the offset. The interpolation ratio is the ratio of the sampling time of the time series before and after interpolation. The offset is the time difference between the first value in the time series before and after interpolation (Fig. 1).
Schema illustrating the different series as well as the offset and interpolation.
Schema illustrating the different series as well as the offset and interpolation.
An interpolation ratio greater than 1 increases the total number of points. For integer interpolation ratios, every th point in the interpolated series lines up with the underlying series only if the offset is zero.
A. Systems
We use the two different model systems for our study and generate the data using the following equations. To remove transient behavior from the initial conditions to some kind of stable dynamics, the first part of every time series is discarded.
1. Autoregressive model
2. Roessler model
B. Method
Scheme illustrating the method used to investigate the change in the RQA measures due to the difference in the sampling time, offset, and interpolation.
Scheme illustrating the method used to investigate the change in the RQA measures due to the difference in the sampling time, offset, and interpolation.
Using the reference instead of the high-resolution time series as a comparison, it is possible to study non-integer interpolation ratios, offsets, and also interpolation ratios smaller than one.
The RP and the RQA measure are calculated for the reference and for every interpolated series using the same recurrence threshold . is the measure obtained for the reference series and for the interpolated series. The differences in the RQA measure are due to the different sampling times and the interpolation. We can compare the results from all interpolated series to the reference series by computing all ratios .
Using this procedure, we mimic the typical sampling bias, e.g., in paleoclimate studies, where sliding windows with different sampling are all interpolated to the same reference sampling and the hidden “true” dynamics is the continuous nature and is not accessible.
C. Results
To illustrate the deviations in the RP between an interpolated series and a reference series, we consider a short time series of an arbitrary autoregressive process and downsample it by a factor of four before interpolating it linearly back to the original size. This procedure leads to an interpolation ratio of and no offset. Here, the high-resolution time series corresponds to the reference series. Comparing the RP of the interpolated time series with the RP of the reference time series, we find some differences (Fig. 3). Points that are 1-points in both RPs are black, points that are only 1-points in the reference RP are red, and points that are only 1-points in the interpolated RP blue. The coarse structure is preserved under the downsampling and interpolation; however, on a smaller scale, many short 1-lines are missing in the interpolated RP, while other 1-lines are merged into greater recurrence structures. Both effects lead to an increased average 1-line length.
Difference of interpolated and reference recurrence plots of an arbitrary AR process. Black pixel mark 1-points in both RPs; red pixel mark 1-points in reference RP, which 0-points in interpolated RP; and blue pixel mark 0-points in reference RP, which are 1-points in interpolated RP. The interpolation ratio is and offset 0.
Difference of interpolated and reference recurrence plots of an arbitrary AR process. Black pixel mark 1-points in both RPs; red pixel mark 1-points in reference RP, which 0-points in interpolated RP; and blue pixel mark 0-points in reference RP, which are 1-points in interpolated RP. The interpolation ratio is and offset 0.
To quantitatively study the effects on the average 1-line length, we use the method described in Sec. III B. For every model system, the high-resolution series is created according to Sec. III A. From this one, reference and multiple test series are created by downsampling with different offsets and downsampling factors . We use downsampling factors between 1 and 50 and for every downsampling factor five different offsets, if possible. For series with a downsampling factor smaller than five, the number of different offsets is limited by the downsampling factor.
The reference series are obtained with a downsampling factor and no offset. The high-resolution series have a length of 2401. The reference series have, therefore, 481 data points. The recurrence thresholds for the systems are chosen so that the RP of the reference series has a recurrence rate of 10%. The recurrence thresholds as well as the measure are given for the Roessler and the three autoregressive systems in Table I. The number in the name of the autoregressive systems (AR-42, AR-43, and AR-44) states the used seed for the random module of the Python library NumPy. With the three different autoregressive systems, we can check whether the results are changed for other realizations of the noise. For every system, the ratio is calculated for every interpolation ratio, interpolation method, and offset separately. To simplify the data, the mean and the standard deviation are calculated over the different offsets, and for the autoregressive systems, also over the three different realizations. The result is, therefore, only dependent on the kind of the system, the interpolation ratio, and the interpolation method.
Properties of the four different reference series.
System name . | Recurrence threshold . | Reference Lref . |
---|---|---|
Roessler | 5.06 | 24.6 |
AR-42 | 1.13 | 1.44 |
AR-43 | 1.05 | 1.36 |
AR-44 | 1.26 | 1.46 |
System name . | Recurrence threshold . | Reference Lref . |
---|---|---|
Roessler | 5.06 | 24.6 |
AR-42 | 1.13 | 1.44 |
AR-43 | 1.05 | 1.36 |
AR-44 | 1.26 | 1.46 |
1. Roessler system
For the Roessler system, and values are almost equal for interpolation ratios smaller than 1.5 (Fig. 4 left). For a larger interpolation ratio, the values for the different interpolated series are smaller than and decrease with an increasing interpolation ratio. The quadratic and the cubic spline interpolation lead to very similar results, whereas the linear and the pchip interpolation create the strongest deviation. The differences caused by the offset values are negligible for the linear interpolation, except at the integer ratios , , and . An explanation for this behavior is given in Sec. IV. Additionally, the same analysis was done with the maximum norm instead of the Euler norm. The results are qualitatively similar (see Appendix B).
Relative deviation of between interpolated and reference series for the Roessler system (left) and three autoregressive systems (AR) (right). The data points represent the mean over the different systems (only for AR) and offsets, and the error bars give the standard deviation.
Relative deviation of between interpolated and reference series for the Roessler system (left) and three autoregressive systems (AR) (right). The data points represent the mean over the different systems (only for AR) and offsets, and the error bars give the standard deviation.
2. Autoregressive process
For the interpolated series from the autoregressive systems, increases with growing interpolation ratios for all methods (Fig. 4, right). calculated from the linearly interpolated series shows the strongest deviation from the reference series. The effect of different realizations and offsets is similar for all considered interpolation methods.
To investigate the deviation caused by different offsets, is calculated without offset and compared with the mean over all offsets. We find that the deviation is only different for integer interpolation ratios (Fig. 5). This is as expected because only for integer interpolation ratios, an offset determines whether all time points of the test series are aligned (without offset) or misaligned (with offset) to the interpolation time points (see Fig. 1). The value is smaller without an offset, which means that the deviation of from is smaller.
Relative deviation of between interpolated and reference series for three autoregressive systems (same as Fig. 4, right). The data points represent the mean over the different systems with (blue) and without (red) different offsets, and the error bars give the standard deviation.
Relative deviation of between interpolated and reference series for three autoregressive systems (same as Fig. 4, right). The data points represent the mean over the different systems with (blue) and without (red) different offsets, and the error bars give the standard deviation.
This result shows that interpolation can have a remarkable impact even if the interpolation ratio is 1 if there is some time offset between the points of the interpolated and test series. This situation commonly arises when the sampling time varies, causing misalignment between the interpolated series and the data. Even in portions of the data where the sampling time in the interpolated series is equal to the data’s sampling time, the exact timings of the different series can be misaligned.
IV. CORRECTION SCHEME
Knowing the specific effect of interpolation on RQA measure , it becomes apparent that a correction scheme aimed at mitigating this interpolation effect would be both desirable and feasible.
A. Method
To construct a correction scheme, which models the deviation in the measure between an interpolated series to the reference series, it is necessary to account for the influences of interpolation and the differences in the sampling rate.
The total number of points in the RP generated from the reference series and the interpolated one is equal. In all investigated examples, the recurrence rate is also very similar; therefore, the deviation in can be derived from the difference in the number of 1-lines . As we will show, the number can differ, because
separated 1-lines in can be connected in ,
1-lines in can be missing in , and
parts of 0-lines in can form 1-lines in .
To tackle the problem, we restrict ourselves to the following case:
There is an integer interpolation ratio between the test series and the interpolation series and no offset.
This means that the test series consists of every th data point from the reference series and is, therefore, a downsampled version of it. The downsampling results in a RP consisting of every th point in every th row of the reference RP (with being the interpolation ratio). The same is true for the diagonals: The th diagonal in consists of every th point from the th diagonal of . We call these points anchor points and these diagonals anchor diagonals (Fig. 6). To calculate the relative change in the number of 1-lines, we restrict ourselves to these diagonals.
Recurrence plots of some arbitrary reference signal (left), the test signal (middle), and the interpolated signal (right). The red boxes indicate the anchor points, which are identical in all plots. Blue boxes indicate all points belonging to the anchor diagonals. The interpolation ratio is 3.
Recurrence plots of some arbitrary reference signal (left), the test signal (middle), and the interpolated signal (right). The red boxes indicate the anchor points, which are identical in all plots. Blue boxes indicate all points belonging to the anchor diagonals. The interpolation ratio is 3.
Example intervals (a) between two 1-anchor points, showing one possible configuration with two additional 1-line starting points. (b) Between one 1-anchor points and one 0-anchor point, showing one possible configuration with one additional 1-line starting point.
Example intervals (a) between two 1-anchor points, showing one possible configuration with two additional 1-line starting points. (b) Between one 1-anchor points and one 0-anchor point, showing one possible configuration with one additional 1-line starting point.
The first two rows give the probability that if the first anchor point is a 1-point that there are 0-lines and 1-lines before the next anchor point, which is also a 1-point. The third and the fourth row calculate the probability that there are 0-lines and 1-lines before the next anchor point, which is a 0-point. In both cases, the total interval has more 1-lines than the interval downsampled to the two anchor points. This is explicitly given for all possible configurations for one example interval in Appendix C.
If we make the assumption that the anchor diagonals have the same statistics for the length of 0- and 1-diagonal lines compared to all diagonals in the interpolated RP , we only have to investigate the intervals between anchor points. For linear interpolation, there are four possibilities in . For the formal derivation, see Appendix B.
In intervals lying between two 1-anchor points, all points are 1; therefore, there is no 1-line starting point.
In intervals lying between two 0-anchor points, there is at most one 1-line in between; therefore, there is either one or zero 1-lines starting point.
In intervals lying between one 1- and one 0-anchor point, then there is one 1-0 transition and therefore, no start of a 1-line.
In intervals lying between one 0- and 1-anchor point, then there is one 0-1 transition and, therefore, one start of a 1-line.
This equation can be used to estimate the correction for the effects of the different sampling and the interpolation under the conditions that there is an integer interpolation ratio and no offset between, the reference series and the test series. Furthermore, anchor and not anchor diagonals in the interpolated RP must have similar probability distributions for the 1- and 0-lines ( and ), and the probabilities for consecutive lines have to be independent. In order to calculate our estimate from a measured from the interpolated RP, it is necessary to know the statistics for the 1- and 0-lines [ and ] from the underlying dynamics. This might for a real-world example be accessible from a period in the data with a high sampling rate. The recurrence rate as well as the jumping probability can be directly computed from the recurrence plot of the interpolated series. The recurrence rate is just the fraction of 1-points, and is calculated by counting the intervals that contain a jump and dividing the results by the number of intervals.
In Sec. IV B, the use of this correction scheme is demonstrated on a simulated data processing example, where everything is known about the true underlying dynamics. Afterward, the scheme is used to correct for sampling and interpolation-induced biases in real-world data.
B. Results
To evaluate the correction scheme, is calculated for the first autoregressive system and the Roessler system in the same way as before (Sec. III) and compared to the correction , which is calculated with the described scheme. being the interpolation ratio is varied, and while and are obtained from the interpolated series and the corresponding recurrence plot , and are obtained from the reference RP .
The estimated for the autoregressive process follows the increase in with increasing interpolation ratio in the real data very closely if the interpolation ratio is smaller than 5 [Fig. 8(b)]. For higher values, there are some deviations caused by the small size of the test series. It has less than 100 data points for an interpolation ratio greater than 5. For the Roessler system, the correction only qualitatively captures the decrease, but with an underestimation of the effect [Fig. 8(a)].
Estimation and measured relative deviations of the measure between the interpolated and reference series for the Roessler (top left) and autoregressive system (top right) (the same as in Fig. 4) for integer interpolation ratios. Equation (26) is used to calculate the correction. The bottom row shows the same systems, but the -measure for the calculation of the change as well as for the calculation of the correction is obtained only from the anchor diagonals (Fig. 6).
Estimation and measured relative deviations of the measure between the interpolated and reference series for the Roessler (top left) and autoregressive system (top right) (the same as in Fig. 4) for integer interpolation ratios. Equation (26) is used to calculate the correction. The bottom row shows the same systems, but the -measure for the calculation of the change as well as for the calculation of the correction is obtained only from the anchor diagonals (Fig. 6).
This deviation is caused by the fact that the following assumptions are not true for the Roessler system. On the one hand, the statistics of the diagonal lines are very different when looking at all diagonals compared to the anchor diagonals. If only these diagonals are considered, the correction for the Roessler system captures the effect also quite well [Fig. 8(c)]. On the other hand, the assumption that the length of consecutive 1- and 0-lines is independent of each other is not true for a deterministic system, such as the Roessler system.
Using consideration from Sec. IV A, it is possible to understand the reasons for the differences between the systems. For the autoregressive process, the dominating effect is that there are fewer short 1- and 0-lines in compared to ; therefore, the total number of lines is smaller and is greater than . The dominating effect for the Roessler system is that there are more short 1-lines between two anchor points in than in ; therefore, the total number of lines is greater and is smaller than .
V. APPLICATION TO PALEOCLIMATE DATA
In this section, we use the insights from the theoretical considerations and simulations above to investigate real-world data. We use the described correction scheme to reduce the influence of the interpolation and the different sampling times.
For this purpose, we use the data from a 290 m long composite core from Chew Bahir in southern Ethiopia. The core was created by combining the two 280 m long parallel lacustrine sediment cores CHB14-2A and 2B, which were drilled within the Chew Bahir Drilling Project (CBDP) in 2014.39 The aim of the CBDP was to establish a high-resolution environmental record spanning an important time interval of human evolution in eastern Africa. The potassium (K) concentration in the sediment is a proxy for the aridity in the region during the time of deposition.40 It was determined by micro x-ray fluorescence ( XRF) scanning, which was carried out with a spatial resolution of 5 mm. To attribute time points to the measured K concentrations, the age-depth model RRMarch202141 is used, which follows a direct-dating approach and uses multiple chronometers, including AMS C dating, optically stimulated luminescence (OSL) dating, argon-argon ( Ar/ Ar) dating, and tephrochronological data. Details about the age-depth model can be found in Roberts et al.41 In the following, we use this environmental record to show sampling and interpolation effects, which are not dependent on the accuracy of the age-depth model. A detailed RQA and corresponding interpretation has been performed in Trauth et al.42
The age-depth model reveals a non-constant sedimentation rate; i.e., the data points in this time series are not sampled equally in time. We investigate the temporal change of the mean diagonal line length using the sliding window approach with a window size of 3079 yr. Therefore, we get a measure of the temporal evolution of the predictability of climate. The mean sampling time changes between these windows. In the oldest part from 616 787 to 434 038 yrBP, the sampling time is around 15 yr, and from 434 038 to 148 149 yrBP, there is a sampling time of around 10 yr. In the newest part of the data from 148,149 yr BP until present, the sampling time changes a lot, from a minimum of around 5.36 to 47.27 yr [Fig. 9(a)]. The strong changes in the sampling time result from the increased number of age measurements available for more recent times. The long periods of uniform sampling times arise due to the limited availability of datable material within the cores during those time intervals.
(a) Mean sampling times for every window (red) and constant sampling times with which the interpolation function is evaluated (blue, green). The window step size is 3079 yr, and the window size is 6153 yr. (b) and (c) for sliding windows, with (blue) and 5.36 yr (green). With fixed . (d) Corrected data with a described scheme.
(a) Mean sampling times for every window (red) and constant sampling times with which the interpolation function is evaluated (blue, green). The window step size is 3079 yr, and the window size is 6153 yr. (b) and (c) for sliding windows, with (blue) and 5.36 yr (green). With fixed . (d) Corrected data with a described scheme.
To analyze the data, we use the measured potassium concentration together with the corresponding time points from the age-depth model as our time series. We interpolate this series linearly and evaluate the interpolation function at time points with a fixed sampling time. To see the effect of a different choice of this sampling time, two time series are created, one with a sampling time of and one with . These sampling times correspond to the lowest and highest sampling times in the data. To avoid effects from the interpolation over long time periods with missing data (hiatuses), we exclude all points of the interpolated time series, where the temporal distance between the two neighboring measured data points exceeds 50 yr. For windows where more than 40% of the points are excluded, we do not calculate the measure. This leads to some gaps in the -series (see Fig. 9).
When comparing the -curves calculated from the interpolated series with and , significant differences emerge. First, the first peak at approximately 10 000 yrBP is considerably stronger in the interpolated series with . Additionally, the amplitudes of the structures from 616 787 to 434 038 yrBP show an increase compared to the amplitudes of the structures between 434 038 to 148 149 yrBP [Figs. 9(b) and 9(c)]. The sampling time in the time intervals of increased amplitudes is lower than in the other parts, which leads to the hypothesis that the difference in the course of the data is due to the different interpolation. As we can see in Sec. III C, the deviations caused by an interpolation ratio smaller than 1 seem to be small compared to the effect of a larger interpolation ratio. This leads to the conjecture that the results from the interpolated series with are closer to the truth and the results of the series with are altered by the interpolation. To investigate this further, we apply our correction scheme to the values. To use this method, the original distributions and have to be known. We assume some stationarity; i.e., these distributions should not change too much within the course of the data. We, therefore, use the histograms of 0- and 1-line lengths from the interval where the original is 5.36 yr. This is the case between 36 132 and 45 501 yrBP. , , and are calculated for every window. The correction method can only be used for integer ratios; therefore, the interpolation ratio is calculated for every window by dividing the mean sampling time in the data inside the window by the interpolation sampling time of and rounded to the next integer to calculate the correction.
After applying the correction, the first peak is reduced in its height, and also, the changes of amplitudes in the older part of the data match the results of the interpolated series with [Fig. 9(d)]. This is an indication that the correction provides a close approximation of the true effect, which leads to the deviations in the measure and helps to avoid misinterpretation due to the interpolation.
VI. CONCLUSION
Recurrence plots and recurrence quantification analysis are useful tools for the analysis of data from non-linear systems. They can give valuable insights into the changes in the dynamics and indicate critical regime changes. Applying these methods to paleoclimate data give additional and complementary valuable insights, which are usually not accessible with standard linear methods.
When using RPs for paleoclimate data, we have to take into account that the data from paleoclimate archives are usually not uniformly sampled in time. One commonly used method to tackle this is to generate an interpolation function and evaluate it at equally spaced time points. This, however, can create some bias and lead to wrong conclusions.
In this work, we have demonstrated that, depending on the dynamics, the average diagonal recurrence line length calculated from the RP of an interpolated signal can be greater or smaller compared to the true derived from the RP of the raw signal with the same temporal resolution. We have shown this by comparing data with different sampling rates, which are interpolated to the time points of the reference time series.
For a Roessler system, calculated from the interpolated series is smaller than the one calculated from the reference series for all considered interpolation methods. For autoregressive processes, on the other hand, is bigger. Furthermore, we have explained that for an integer interpolation ratio, a possible offset can also change the result, and this is even true if the temporal resolution is not changed.
We identified three main reasons for the difference between the series with different sampling and interpolation: (1) distinct recurrence lines in the reference RP are merged in the interpolated recurrence plot, (2) short recurrence lines in the reference RP are missing in the interpolated one, and (3) there are short recurrence lines in the interpolated RP, which are not present in the reference one.
Using these insights, we developed a correction scheme and discussed its capabilities and limitations. For the autoregressive process, it can predict the difference very precisely, as long as the interpolation ratio is not too large and the downsampled data, therefore, not too short. For the Roessler system, on the other hand, it can capture the decrease but underestimates the effect up to a factor of two. Here, the main reason for this deviation seems to be that the correction is only true for the diagonal lines in the interpolated plot, which correspond to a diagonal in the downsampled one.
In the last part of our work, we have investigated paleoclimate data from a lake sediment core. First, we showed that the course of the measure, when calculated in sliding windows, strongly depends on the choice of sampling time when evaluating the interpolation function. We then applied our proposed correction scheme to produce an approximation of the actual course.
This study shows that there are potentially big biases, when interpolating data, before applying the RP method. It also shows that the effect is strongly system dependent, and a simple correction might not be possible. The proposed correction scheme provides an intuition on which system shows which effects and also provides an approximate correction. The correction scheme is only valid for integer interpolation values without offset, which is rarely the case for real data. Nevertheless, this can give an approximation of the real effect. Further research is required to enhance our understanding of the impact of offset and noninteger interpolation ratios. Another limitation of this study is that the series with different temporal resolutions are obtained by downsampling, which is only similar to the measurement process if the measure yields the value of some quantity at one time point. In reality, measurements often do some kind of averaging over a finite period of time. To understand to what extent this changes the result of this study and how to change the correction scheme, further research is required.
ACKNOWLEDGMENTS
We thank Dr. Tobias Braun and Vanessa Skiba for fruitful discussions.
AUTHOR DECLARATIONS
Conflict of Interest
The authors have no conflicts to disclose.
Author Contributions
Nils Antary: Software (lead); Visualization (lead); Writing – original draft (lead); Writing – review & editing (equal). Martin H. Trauth: Writing – review & editing (equal). Norbert Marwan: Supervision (lead); Writing – review & editing (equal).
DATA AVAILABILITY
Code and data used for the research in this study are available in Zenodo at https://doi.org/10.5281/zenodo.8123086.
APPENDIX A: START LENGTH
APPENDIX B: RECURRENCE OF LINEAR INTERPOLATED POINTS
Here, we show the mathematical proof for the statements made about the 1-points on anchor diagonals.
In intervals lying between two 1-anchor points, all points are 1.
- The distance matrix for an interpolated time series is given bywhere is the linear interpolation function. When looking at interpolation with integer and without offset, we have anchor diagonals. At the anchor diagonals, the following equation holds:with being an integer and and are the sampling times of the reference and test series. The interpolation function is a linear function for , where are the sampling points of the test series. Therefore, is the difference between two linear functions for and . The second equation can be rewritten as , which is the same as , which is the same as the first conditions. This shows that the difference itself is a linear function between two anchor points.For the linear function, which goes through the points and , we can show that if and are smaller than , then all points in between have a norm smaller than . It follows that if our interpolation function is a linear function between two anchor points and both anchor points are recurrent and therefore, their norm is smaller than , all points on the interpolation function in between have a norm smaller than and are, therefore, also recurrent.
In intervals lying between two 0-anchor points, there is at most one 1-line in between.
Two 1-lines would violate the first point because there would be 0-points between 1-points.
In intervals lying between one 1- and one 0-anchor point, then there is one 1-0 transition.
There cannot be an additional 0-line before the last 1-point, as shown in the first point.
In intervals lying between one 0- and 1-anchor point, then there is one 0-1 transition.
There cannot be an additional 0-line after the first 1-point, as shown in the first point.
APPENDIX C: EXAMPLE OF AN INTERVAL ON AN ANCHOR DIAGONAL
To illustrate changes to different intervals on an anchor diagonal, we show in Table II the effect of downsampling and interpolation. Furthermore, we give the probability of finding such an interval in the original recurrence plot.
Example of different intervals on the anchor diagonal of a recurrence plot and how they are changed when the underlying data are downsampled and linearly interpolated with an interpolation ratio of r = 4. The first and the last point are always the anchor points, as described in Sec. IV A.
Original . | Downsampled . | Interpolated . | 1-line-starts . | Probability . |
---|---|---|---|---|
11111 | 11 | 11111 | 0 | P(ls > r) |
11000 | 10 | 11100 | 0 | P(ls < r) ⋅ P(ls + d1 > r | ls < r) |
10011 | 11 | 11111 | 1 | P(ls + d1 < r) ⋅ P(ls + d1 + l1 > r | ls + d1 < r) |
10110 | 10 | 11100 | 1 | P(ls + d1 + l1 < r) ⋅ P(ls + d1 + l1 + d2 > r | ls + d1 + l1 < r) |
10101 | 11 | 11111 | 2 | P(ls + d1 + l1 + d2 < r) ⋅ P(ls + d1 + l1 + d2 + l2 > r | ls + d1 + l1 + d2 < r) |
Original . | Downsampled . | Interpolated . | 1-line-starts . | Probability . |
---|---|---|---|---|
11111 | 11 | 11111 | 0 | P(ls > r) |
11000 | 10 | 11100 | 0 | P(ls < r) ⋅ P(ls + d1 > r | ls < r) |
10011 | 11 | 11111 | 1 | P(ls + d1 < r) ⋅ P(ls + d1 + l1 > r | ls + d1 < r) |
10110 | 10 | 11100 | 1 | P(ls + d1 + l1 < r) ⋅ P(ls + d1 + l1 + d2 > r | ls + d1 + l1 < r) |
10101 | 11 | 11111 | 2 | P(ls + d1 + l1 + d2 < r) ⋅ P(ls + d1 + l1 + d2 + l2 > r | ls + d1 + l1 + d2 < r) |
APPENDIX D: INTERPOLATION EFFECT ON ROESSLER WITH A MAXIMUM NORM
To show that the effect seen for the Euler norm is also present when using the maximum norm, we performed the analysis from Fig. 4 (left) again, but using the maximum norm instead. Here, we also see that the -measure obtained from the interpolated recurrence plot is decreased compared to the reference recurrence plot and the deviation is present for all interpolation methods and increases with the interpolation ratio. The linear and pchip interpolation leads to a similar deviation, which is qualitatively different from the deviation after using one of the spline interpolations (Fig. 10).
Relative difference of between interpolated and reference series. For a Roessler system, , , and the maximum norm is used.
Relative difference of between interpolated and reference series. For a Roessler system, , , and the maximum norm is used.