Paleoclimate records are rich sources of information about the past history of the Earth system. Information theory provides a new means for studying these records. We demonstrate that weighted permutation entropy of water-isotope data from the West Antarctica Ice Sheet (WAIS) Divide ice core reveals meaningful climate signals in this record. We find that this measure correlates with accumulation (meters of ice equivalent per year) and may record the influence of geothermal heating effects in the deepest parts of the core. Dansgaard-Oeschger and Antarctic Isotope Maxima events, however, do not appear to leave strong signatures in the information record, suggesting that these abrupt warming events may actually be predictable features of the climate’s dynamics. While the potential power of information theory in paleoclimatology is significant, the associated methods require well-dated and high-resolution data. The WAIS Divide core is the first paleoclimate record that can support this kind of analysis. As more high-resolution records become available, information theory could become a powerful forensic tool in paleoclimate science.

Paleorecords like ice and sediment cores capture important information about the past dynamics of the climate system. While a great deal of sophisticated and creative work has been done on these records, very little of that work has leveraged the power of information theory. Recently, it has been shown that the Shannon entropy rate—a measure of the average rate at which new information, unrelated to anything in the past, is produced by the system that generated a time series—can be useful in flagging anomalies introduced by the data-analysis pipeline.^{1} Our goal in this paper is to move beyond anomaly detection and into science. To that end, we use permutation entropy techniques to estimate the Shannon entropy rate of the longest continuous and highest-resolution water-isotope record yet recovered from Antarctica: the 3.3 km West Antarctica Ice Sheet (WAIS) Divide core, which covers more than 60 000 years of the Earth’s history. We show that the weighted permutation entropy of these data correlates with snow accumulation at the drilling site. These calculations also reveal possible disparities between the two isotopes that were measured at WAIS near the base of the core, which may have climatologically interesting implications. Surprisingly, we find that the signatures of Dansgaard-Oeschger and Antarctic Isotope Maxima events in the information record are small, suggesting that these large, abrupt warming events may not represent significant changes in the climate system’s information dynamics.

While permutation entropy techniques elucidate forward information transfer in a signal, they do not provide a mechanistic understanding for the underlying process; even so, they are useful in the doing of science: flagging regions for expert scientists to examine, for instance, and providing some useful information to scaffold their reasoning. Moreover, since WPE has implications for predictability,^{2,3} these results lend support for the potential utility of predictive models and offer an alternative mathematical perspective for those in the geoscience community who believe that these events are largely noise-driven and unpredictable.

## I. INTRODUCTION

The Earth contains a vast archive of geochemical information about the past and present states of the climate system. The data in these records—samples from corals, marine and lake sediments, tree rings, cave formations, the ice sheets, etc.—capture a rich spatiotemporal picture of this complex system. Ice cores, for example, provide high-resolution proxies for hydrologic cycle variability, greenhouse gases, temperature, and dust distribution, among other things. While a great deal of sophisticated and creative work has been done on these records, very little of that work has leveraged the power of information theory. The Shannon entropy rate,^{4,5} for instance, measures the average rate at which new information—unrelated to anything in the past—is produced by the system that generated a time series. If that rate is very low, the current observation contains a significant amount of information about the past. Conversely, if the Shannon entropy rate is very high, most of the information in each observation is completely new: i.e., the past tells you little or nothing about the future.

As we will show, these kinds of calculations can bring out valuable information from paleoclimate data records. Here, we use permutation entropy techniques^{6,7} to estimate the Shannon entropy rate of the longest continuous and highest-resolution water-isotope record yet recovered from Antarctica: the West Antarctica Ice Sheet Divide core (WDC). It has recently been shown that these techniques can bring out laboratory and data-processing effects that are difficult to see in the raw data.^{1} In this paper, we focus on science, showing that the weighted permutation entropy of these data correlates with accumulation (meters of ice equivalent per year) at the drilling site and reveals possible signatures of geothermal heating at the base of the core; that is, these information-theoretic calculations are not only useful for revealing hidden problems with the data, but also potentially powerful in suggesting new and sometimes surprising geoscience. They also pave the way toward more-advanced interhemispheric entropy comparisons that could elucidate some deeper questions about the larger climate system. We show that the signatures of Dansgaard-Oeschger events in the information record are small, for instance, suggesting that these large, abrupt events may not represent significant changes in the climate system’s “information” dynamics. In situations like this, it has been shown^{2,3,8} that prediction may be possible, given the right model. In view of the debate in the climate community about forcings and triggers for these events, our results lend support for the potential utility of predictive models and offer an alternative mathematical perspective for those that believe that these events are largely noise-driven and unpredictable.

Interest in the use of information theory in the study of Earth’s climate system has been growing over the past few years.^{9,10} Saco *et al.*^{11} used the Shannon entropy rate to explore different climate-change events captured in El Niño/Southern Oscillation proxy records derived from the Laguna Pallcacocha sedimentary data, for instance. Information-theoretic techniques have also been applied to time-series data about Earth’s “current” climate. For example, in Ref. 12, Meng *et al.* use an entropy-based technique to accurately forecast the magnitude of El Niño events even beyond the “spring prediction barrier.” See Ref. 13 for a recent review. In addition, several information-theoretic measures have been used to quantify causal coupling strengths within the Earth’s climate system (e.g., Refs. 14 and 15). Because these techniques require high-resolution well-dated data, however, they have been largely inappropriate for the paleoclimate community until quite recently. Recent advances in experimental and laboratory techniques are beginning to remove this barrier. A 2016 pilot study^{16} showed that high-resolution records like the WDC dataset are indeed appropriate for information-theoretic analyses and suggested some preliminary hints as to the power of that approach, which we explore in depth in this paper. Permutation-entropy techniques can even aid in the laboratory work involved in these records, flagging anomalies introduced during the collection and processing of the data, and even effectively guiding the repair of the record.^{1}

The goal of this paper is to move beyond anomaly detection and into science. In particular, this paper reports the first systematic study of the use of information theory to extract meaningful scientific insights and conjectures about the past climate from an ice-core record. By elucidating how information is created in and propagated through the paleoclimate system, information-theoretic studies of ice-core data could help us identify and understand triggers, amplifiers, sources of persistence, and globalizers of climate change.^{17,18}

## II. MATERIALS AND METHODS

### A. Ice core data collection and description

For the analysis reported here, we used the ratios of $2H/1H$ and $18O/16O$ from the West Antarctic Ice Sheet Divide core (WDC), abbreviated $\delta D$ and $\delta 18O$, respectively. Collection and processing of this record, which is available from the website of the U.S. Antarctic Program Data Center,^{19} are described in depth in Refs. 20 and 21. A brief summary of the processing follows.

The record was analyzed using a Picarro Inc. cavity ringdown spectroscopy (CRDS) instrument, coupled to a continuous flow analysis (CFA) system.^{22} The data are reported in delta ($\delta $) notation relative to the baseline Vienna Standard Mean Ocean Water (VSMOW) and normalized to the Standard Light Antarctic Water (SLAP, $\delta 18O=\u221255.5$‰, $\delta D=\u2212428.0$‰) scale. The $\delta $ values were determined by $\delta =1000(Rsample/RVSMOW\u22121)$, where $R$ is the isotopic ratio $18O/16O$ or D/H (i.e., $2H/1H$). The CRDS-CFA system introduces a noise level of $\sigma noise=0.55$‰ for $\delta D$ and $\sigma noise=0.09$‰ for $\delta 18O$ throughout the signal. $\delta 18O$ and $\delta D$ are proxies for local temperature and regional atmospheric circulation resulting from variability in the hydrologic cycle.

Water-isotope traces measured on the Picarro instrument were recorded at a rate of 1.18 Hz (0.85 s intervals). Ice samples were moved through the CRDS-CFA system at a rate of 2.5 cm/min, yielding millimeter resolution. The data were then averaged over nonoverlapping 0.5 cm bins. For each of these data points, an age was determined using the WDC depth-age scale, providing climate data from 0 to 68 kybp (thousands of years before present).^{20} Annual dating of this record extends to 31 kybp,^{23} with the remainder relying on tie points to the Hulu Cave time scale.^{24}

The final step before publication on the U.S. Antarctic Data Center website involved linear interpolation to an even temporal spacing of 1/20th year. This decision on the part of the analysis team that created this data set has important implications for any kind of time-series analysis—including permutation entropy, as described below.

### B. Weighted permutation entropy

Information-theoretic measures like the Shannon entropy rate^{5} are typically calculated from “categorical” data: sequences of symbols, like heads and tails for a coin-flip experiment. To perform these calculations on continuum data like $\delta D$ and $\delta 18O$, one must first convert those data into symbols. A typical approach to this—binning—introduces bias and is fragile in the face of noise.^{25,26} The permutation entropy of Bandt and Pompe^{6} solves that problem, for the purposes of estimating the Shannon entropy rate, using ordinal analysis. This involves mapping time-delayed elements of a scalar time series to value-ordered permutations of the same size. For example, if successive values of a time series are $(x1,x2,x3)=(6,1,4)$, then the ordinal pattern, $\varphi (x1,x2,x3)$, of this three-letter “word”—formally, a “permutation”—is 231 since $x2\u2264x3\u2264x1$. The ordinal pattern of the permutation $(x1,x2,x3)=(60.1,15.8,4.0)$ is 321. By calculating statistics on the appearance of these permutations in a backwards-looking sliding window across a signal, one can assess its predictability—that is, how much new information appears at each time step, on the average, in that segment of the time series.^{2,3}

Formally, let $Sm$ be the set of all $m!$ permutations $\pi $ of order $m$. Then, given a real-valued time series ${xi}i=1,\u2026,N$, one constructs a set of so-called “delay vectors,”

Here, $m$ is known as the word length and $\tau $ as the time delay. Each delay vector is then mapped by a function $\varphi $ to the corresponding permutation $\pi \u2208Sm$. For each $\pi \u2208Sm$, one calculates the relative frequency of that permutation occurring in ${xi}i=1,\u2026,N$,

where $p(\pi )$ quantifies the probability of an ordinal and $|\u22c5|$ is set cardinality. The permutation entropy of order $m\u22652$ is

Since $0\u2264PE(m)\u2264log2\u2061(m!)$,^{6} it is common in the literature to normalize by $log2\u2061(m!)$, producing PE values that range from 0 to 1.

Note that permutation entropy, as defined above, does not distinguish between $(xi,xi+\tau ,xi+2\tau )=(6,1,4)$ and $(xi,xi+\tau ,xi+2\tau )=(1000,1,4)$, and so, it can fail if the observational noise is larger than the trends in the data but smaller than its large-scale features.^{7} This can be addressed by introducing a term into the calculation that weights each permutation by its variance,

where $Xim,\tau \xaf$ is the arithmetic mean of the values in $Xim,\tau $. The weighted probability of a permutation is defined as

where $\delta (x,y)$ is 1 if $x=y$ and 0 otherwise. The final calculation of WPE is similar to Eq. (2) above, but with the weighted probabilities,

This variant of permutation entropy—weighted permutation entropy or WPE^{7}—is used for all calculations in this paper, again with a normalization that causes the resulting values to run from zero (no new information; fully predictable) to 1 (all new information; completely unpredictable).

Successful use of the WPE method requires good choices for its three free parameters: the delay $\tau $ between samples, the word length $m$, and the size $W$ of the sliding window over which the statistics are calculated for each WPE value. (In the examples above, $m=3$ and $\tau =1$.) Very little mathematical guidance is available for these choices and their effects are not independent. The $\tau $ parameter controls the spacing of the permutation elements. For low $\tau $ values, permutations are strongly affected by high-frequency deviations; for larger $\tau $, those deviations—and, eventually, all of the structure in the data—are filtered out. Here, we use $\tau =7$ because it mitigates high-frequency noise effects without being so large that all features are lost;^{27} this is discussed at more length at the end of Sec. III. The window size $W$ controls the resolution of the analysis. The word length $m$ must be long enough to allow the discovery of forbidden ordinals,^{28} yet small enough that reasonable statistics over the ordinals can be gathered in a window of that size. Choices for the window size and the word length are thus in some tension, since one generally wants the best possible temporal resolution. In the literature, $3\u2264m\u22646$ is a standard choice, generally without any formal justification. In theory, the permutation entropy should converge to the Shannon entropy rate as $m\u2192\u221e$, but that requires an infinitely long time series.^{29,30} In practice, the right thing to do is to calculate the “persistent” permutation entropy by increasing $m$ until the large-scale features of the resulting curve converge. That approach was used to choose $m=4$ for the calculations in this study. This value represents a good balance between accurate ordinal statistics and finite-data effects. That $m$ value, in turn, dictated a minimum window size of 2400 points if one considers 100 counts per ordinal as sufficient.^{2} This translates to 120 years’ worth of ice in the 1/20th-year spaced WDC traces used here.

The WAIS Divide ice core is the first such record that is suitable for this type of information-theoretic analysis. With a shorter data set, or one with lower resolution, there would be no way to balance the trade-offs outlined in the previous paragraph regarding good choices for the free parameters of the WPE method. Moreover, there are other considerations as well: the mathematics of WPE requires that each window samples a weakly stationary system.^{6}

To calculate WPE as a function of time, one must have evenly sampled data. This is a major issue in many paleorecords because of their uneven temporal spacing. In the WDC, for instance, the 0.5 cm spacing of the samples, combined with the nonlinear age-depth relationship of the core, produces a raw data series whose temporal spacing increases systematically with depth. (One could certainly calculate WPE directly on these data, but the timeline of the result would be deformed.) Our use of the 1/20th-year spaced data from the U.S. Antarctic Program Data Center avoids this issue—but not completely, since linear interpolation introduces ramps in the signal: repeating patterns in the permutations that can skew their distribution and thereby lower the WPE. Note that this effect will generally worsen with depth because the percentage of interpolated points in the 1/20th-year spaced versions of the $\delta D$ and $\delta 18O$ traces grows nonlinearly with the depth of the core. The specific form of this effect will also depend on the shape of the climate signal. The mathematics of information theory currently offers no way to approach any kind of closed-form derivation of these complicated effects.

In the face of these issues, it is important to be mindful of interpolation-induced effects in WPE calculations. Among other things, one should not compare WPE values of a single trace from an ice core across wide temporal ranges if the data have undergone depth-dependent interpolation, especially when one is working deep in the core. One should also perform sensitivity analyses on all of the free parameters of the method, obviously, since the intertwined effects of the sample spacing, interpolation, event frequencies, etc., will play a role in good choices for those values. More broadly, sampling issues in geoscience data have been the focus of increased attention in the past few years,^{31,32} and sophisticated methods have been proposed for assessing and working around the associated problems.^{33–35} A careful study of the relative effects of all of these influences, issues, and methods—including the linear-interpolation approach used here, which is the standard practice in paleoclimate data analysis—is beyond the scope of this paper.

In addition to interpolation effects, seasonal trends could potentially affect WPE. To determine whether this was an issue in our results, we used traditional seasonality-removal methods (e.g., finite differencing and model fitting) on the isotope data and repeated the WPE calculations. This had no qualitative impact on the results.

## III. RESULTS

Figure 1 shows the $\delta D$ and $\delta 18O$ traces from the WAIS Divide Core, together with the WPE calculated from each trace following the procedure described in Sec. II B. A number of features stand out here. The WPE values of both isotopes are much lower during the glacial period (before 20 kybp),^{20} for instance, than in the last 5000 years, indicating a stronger dependence of each isotope value on its previous values. During the transition from the glacial to the interglacial between 11 kybp and 20 kybp,^{20} both WPE traces rise sharply at first, beginning around 17 kybp, then fall during the Antarctic Cold Reversal (ACR) period from $\u224812.9$ to $14.5kybp$ before peaking at the time of the transition from the Younger Dryas ($\u224811.5\u221212.8kybp$) to the current Holocene period from 11 kybp to present,^{20} coincident with the time of a known spike in accumulation.^{36} This alignment of changes in WPE with known shifts in the climate system gives us confidence that this technique is extracting meaningful information from the paleorecord. As we will show, there are other features in WPE that correlate with known climate information—most strongly, accumulation—as well as distinct “differences” between the two WPE traces. Some of these correlations and disparities, we will argue, may be scientifically meaningful.

One such difference is the divergence between the $\delta D$ and $\delta 18O$ WPE traces near the base of the ice sheet: the WPEs of the two isotopes are highly correlated for the first $\u224840kybp$, but then the $\delta 18O$ WPE rises and the $\delta D$ WPE falls. From an information-theoretic perspective, this suggests that $\delta 18O$ is harder to predict^{2,3} than $\delta D$ for data deep in the core. The cause of this divergence is challenging to pinpoint with WPE alone. This technique simply reveals forward information transfer in a signal but does “not” provide a mechanistic understanding for the underlying process; even so, it is useful in flagging regions for expert scientists to examine—and in providing some useful information for their deliberations.

In this case, those deliberations examined two potential nonexclusive classes of issues: (1) this finding is related to something fundamental about $\delta 18O$ and $\delta D$ at that depth or (2) the divergence is related to a combination of natural and artificial data issues. In the first class, one potential explanation is the different molecular masses of the two isotopes, which would cause them to be affected differentially by second-order thermal effects, e.g., thermal diffusion due to geothermal heat at the bedrock-ice interface. This is speculative, of course, and needs further investigation—e.g., corroboration in another deep core from this region.

The unknown climatological effects over the tens of thousands of years of the core’s formation, together with the extremely involved process involved in collecting and processing that core, make the second class of issues somewhat daunting to tease apart. The divergence could be due to the signal-to-noise ratio, timeline uncertainty, and/or interpolation—all of which are more of an issue this deep in the core, where the signal strengths are lower, the age-depth model less certain, and more of the points in the time series are interpolated. Since the interpolation and age-depth conversion processes are identical for the two isotopes, though, they can most likely be ruled out as possible causes for the divergence between the blue and red traces in Fig. 1. The signal-to-noise ratio is unlikely to be the source of the difference either, though the explanation is more subtle. Diffusion causes the amplitude of the $\delta D$ and $\delta 18O$ signals to diminish with depth. Since the noise introduced by the measurement apparatus is consistent across the entire core,^{22} the signal-to-noise “ratio” decreases with depth—possibly differentially, since $\delta D$ is significantly larger than $\delta 18O$—but the effect will still be ratiometric across the core. Furthermore, there is no correlated drop in the signal-to-noise ratio of $\delta 18O$ near the divergence point, which could explain this divergence in predictability at 50 kybp. Furthermore, the measurement pipeline is not the only process injecting noise into the measured signal. The ice in this region has gone through more than 40 000 years of natural “data processing,” including internal variability, stochastic redistribution of snow, and multiple sources of diffusion and compression, among other things. These processes could conceivably skew the signal-to-noise ratio and explain the divergence.

Some of these data effects are quantifiable; others simply are not. The takeaway point here is that, while WPE calculations, alone, cannot elucidate the underlying mechanisms that produced this divergence effect, the ability of this technique to identify a region of the core where there is a change, or a disparity, in the information-theoretic content of the data can be quite useful to paleoclimate experts in targeting their analyses and formulating hypotheses about its provenance and scientific implications. Another interesting scientific finding brought out by these calculations is the relationship between WPE and accumulation, which is explored in Fig. 2. Visual examination of part (a) of this figure suggests significant correspondence between the features: many of the bumps, troughs, and trends in the three curves occur at the same time in the record. In hindsight, it is perhaps not surprising that WPE tracks accumulation: isotope diffusion intermingles the information in the neighboring layers of the core, which will lower the WPE. However, there are spatial scales involved in that process, since the diffusion rate depends on the density of the ice. Besides, if the annual layer is thicker, less of the information in that layer will be lost. In other words, accumulation mitigates diffusion effects, thereby preserving the information that was laid down in the core.

Note that the WPE traces in Fig. 2(a) do not track accumulation perfectly, though they plateau earlier in the Holocene, for instance, and contain some structure during the glacial period that is not present in the accumulation trace. The correlation plot in Fig. 2(b) explores these relationships in more detail. While the WDC results do show a general trend with accumulation, it is not entirely linear ($R2=0.927$). We conjecture that these deviations from linearity are encodings of climate signals. To explore this, we obtained a Community Firn Model (CFM) run^{37} with the accumulation rate and temperature set to that measured at the WAIS Divide^{36} and the water-isotope input fixed throughout the record at a constant annual amplitude and no variation in the mean. In the CFM, individual packets of snow/firn/ice are tracked downward over time. At each time step, a new packet is added on top, and the oldest packet is removed from the bottom of the stack. Each packet is compressed at each time step based on its overburden load, temperature, and any other tracked parameters in the model physics. Temperature is also calculated at each step using thermal parameters appropriate for the current density-depth structure.

A synthetic input isotope signal spanning 30 kybp was created based on a cosine wave, with an amplitude ($a$) of 2 ‰, time step ($\Delta t$) of 1/12th year, and a mean value ($\mu $) of $\u221228$‰,

Red noise was added to $\Delta cos$ to produce the synthetic isotope signal $\Delta syn$ for the CFM run,

where $k=0.7$ and $w$ is white noise with a standard deviation of 20% of the annual amplitude of the cosine wave. As observed at WAIS Divide, the model depth in the ice sheet at 30 kyr was set to 2816.435 m, and the model depth at the base of the ice sheet was set to 3405 m. We analyzed two different CFM scenarios, all using the $\Delta syn$ signal described above:

firn density of $400kg/m3$, estimated temperature at WAIS Divide,

^{38}thermal diffusion ON, estimated accumulation at WAIS Divide,^{36}and isotope diffusion OFF andfirn density of $400kg/m3$, estimated temperature at WAIS Divide,

^{38}thermal diffusion ON, estimated accumulation at WAIS Divide,^{36}and isotope diffusion ON.

The output for these experiments can be seen in the top two panels of Fig. 3. The second iteration of the CFM model run (shown in gray in Fig. 3) is used in Fig. 2. The bottom panel of Fig. 3 reiterates the strong correlation in the experimental data between accumulation and WPE when isotopic diffusion is present. When isotopic diffusion is not present, the model shows a complete lack of correlation.

Note the difference in the shapes of the two bottom plots in Fig. 2: WPE and accumulation are very tightly correlated ($R2=0.968$) in a model run that includes no climate variability and less so in the real data. This, too, is an interesting flag for scientific reasoning. The obvious deviation from linearity in Fig. 2(b) around 15–16 kybp occurs during the transition from the glacial to interglacial periods, where many climatic variables are known to have changed.^{20} In the CFM, we did not consider the effect of climate changes on the water-isotope signal, so it is not surprising that this deviation is not present in Fig. 2(c). Indeed, this further confirms the underlying linear relationship that is suggested by the climatology, thereby adding weight to the conjecture that the deviations from linearity may be encodings of climate signals. It is worth noting that rudimentary statistics—e.g., rolling-window variance calculations—do not track accumulation and do not reveal any divergence between the isotopes near the base of the core.^{1} There may, of course, be other statistical measures that can extract the accumulation signal from this record. To our knowledge, though, there has not been any systematic exploration of the link between accumulation and the statistics (linear or nonlinear) of the water-isotope data—let alone the information-theoretic content of those data.

Replicating existing accumulation data is more than just a nice validation of a conjectured connection. Calculation of the gray accumulation trace in Fig. 2(a) required a nontrivial amount of effort on the part of an expert scientist. The WPE traces in this figure required microseconds of central processing unit (CPU) time. Moreover, if our conjecture about climate-signal encoding is borne out by future study, WPE may prove to be an effective way to extract new information from ice-core records.

Another interesting and scientifically meaningful property of the WPE results is what is “not” there: specifically, there is no systematic correspondence between WPE and Dansgaard-Oeschger (DO)^{39} events or their smaller, temporally lagged cousins, the Antarctic Isotope Maxima (AIM);^{40} see Fig. 4. There “is” a clear peak in WPE at the time of the Younger Dryas event, but that is probably due to the accumulation effects described above. Spectral analysis of the WPE traces, following the multitaper method described in Ref. 41, shows that while millennial frequencies persist throughout the records, they are not statistically significant; that is, we do not see concrete evidence of any persistent frequencies that might correspond to a repeating trigger mechanism of DO and AIM events; rather, the WPE results indicate that while these events left significant temperature signatures in cores from some regions, they did not change the predictive content of the time series recorded in the ice at the WAIS Divide. This suggests that these abrupt events may not represent significant changes in the “information” dynamics of the overall climate system—and that they could, in principle, be predicted, given the right model^{2,3} (though obviously finding such a model is a very tall order).

One of the key questions about any climate change is whether it is an expected, natural part of the climate system—e.g., seasonal variations in solar insolation—or an unexpected, random event, like a large asteroid impact. The distinction is important; societies can prepare for the latter by, for example, stockpiling food. Random, unexpected events are more problematic. Events like these are considered some of the most threatening changes to modern human societies.^{39,42,43} The lack of a response in WPE during or leading up to AIM and DO events suggests that they may not truly be random. Much more work is needed if that claim is to be confirmed, of course, but the potential implications are important. Knowledge about their predictability could change how we approach these events, both from a scientific standpoint and a societal point of view.

Absence of evidence, of course, is not evidence of absence. This is a particularly thorny point when one is using nonlinear statistics on sparse data. As discussed in detail in Ref. 44, the important issue of significance—whether or not a given feature in a WPE trace (e.g., jump, spike, valley) indicates some sort of a substantive change in the underlying information mechanics of the system—is an issue that has not yet been addressed in the literature. With only one time series available, traditional significance tests from statistics are inapplicable, and without a reliable generative model for the climate, one cannot use surrogate methods for significance testing.^{45} A very recent paper by Darmon *et al.*^{44} suggests a way around this, using reservoir computing to approximate the systems dynamics as a surrogate for a generative model, which can then be used to generate bootstrapped time series for significance testing. Like many machine-learning methods, however, this method requires large training sets—far larger than the WDC record.

Until these shortcomings have been fully addressed and solid significance testing techniques have been developed for WPE calculations on paleorecord-size data sets, interpreting small-scale fluctuations—such as those near the times of DO events in Fig. 4—should either be avoided altogether or done with careful consideration and persistence testing over a wide range of values for the free parameters of the algorithm.

Apropos of free parameters: another interesting way to leverage the power of permutation entropy is to vary the time delay $\tau $ in the WPE formula. This parameter controls the “stride” of the calculation, playing a role similar to that of the cutoff frequency of a low-pass filter. Figure 5 shows a series of $\delta D$ and $\delta 18O$ WPE calculations with a range of $\tau $ values. Since the $\tau $ at which a feature disappears is related to the time scale of the associated effect, one can preferentially focus on—and distinguish between—long-term effects (e.g., climate) or faster ones (e.g., weather or noise) simply by tuning $\tau $. Persistence of features across a range of $\tau $ values—such as the set of bumps between $\u22489$ and $14kybp$ in Fig. 5—indicates an effect in the underlying signal that spans multiple time scales. Increasing $\tau $ generally raises the WPE curves; this simply reflects decreasing predictability over the longer time span sampled by each permutation. Some intervals in the data depart from that pattern—e.g., near 17.5 kybp, where the curves cluster more tightly. This may be useful in highlighting regions of interest for paleoclimate experts; around 17.5 kybp, for example, there is a sharp spike in the diffusion rate at the WDC,^{46} which may be the cause of the tight clustering of the curves in that region of Fig. 5.

WPE calculations with small $\tau $ values can also effectively flag data and data-processing issues that can be hard to see without detailed, point-by-point examination of the raw data.^{1} A clear example of this is the large bump from $4.5$ to $6.5kybp$ in the black traces ($\tau =1$) in Fig. 5. As reported in Ref. 1, an older instrument was used to analyze the ice in this region; a closer examination of the data revealed that, as the WPE results suggested, this instrument introduced significant noise into the data. The spikes around 58 kybp in Figs. 1 and 5 offer another compelling example. In this region, 1.107 m (110.1 yr) of ice was missing from the record. Interpolating across this gap resulted in highly predictable values, which caused WPE to plummet for the width of the calculation. Please see Ref. 1 for a deeper study of the utility of WPE as an anomaly detection method, complete with comparisons to standard methods and experimental validation via targeted remeasurement and reanalysis of a section of the WAIS Divide core—the longest high-resolution resampling of an ice core to date.^{47}

## IV. CONCLUSIONS

The underlying premise of this paper is that information theory can be useful in understanding the climate signals that are captured in ice-core records. We demonstrated that estimates of the Shannon entropy rate of the water-isotope data from the WAIS Divide Core, calculated using weighted permutation entropy (WPE), can bring out valuable new information from this record. We found that WPE reveals a fundamental difference between the $\delta D$ and $\delta 18O$ traces deep in the core. It tracks many well-known climate phenomena but does “not,” surprisingly, contain strong signatures of the DO and AIM events that occurred during the last glacial period—a finding that may have implications about the predictability of those events. WPE generally correlates with accumulation, which suggests a far less onerous way of calculating or corroborating accumulation curves for future records. WPE also contains features that do “not” correspond to features in the accumulation record. We suspect that these deviations may be encodings of climate signals. Unfortunately, unraveling the individual effects of multiple components of a signal on the joint Shannon entropy rate of that signal is simply beyond the power of WPE—or any related entropic measures; even so, WPE’s ability to flag these regions for analysis by paleoclimate experts is a major advantage.

While information-theoretic measures are powerful, they require careful handling and high-resolution, well-dated data. Preprocessing steps that affect the data and the timeline can skew their results, as discussed here and in several recent papers.^{3,31,32} Moreover, the associated algorithms have a number of free parameters that must be chosen properly. (This is true of any other data-analysis method, of course, though that is not widely appreciated in many scientific fields.) The WDC is the first ice-core record that is suitable for these types of analyses. As more high-resolution records become available, the mathematics is developed, and the intertwined effects of sample spacing, interpolation, event frequencies, and the free parameters of the method are better understood, information theory will likely become a common forensic tool in climate science.

## ACKNOWLEDGMENTS

The authors would like to thank Jakob Runge, Nihat Ay, Holger Kantz, and Stephan Bialonski for valuable discussions, as well as Bruce Vaughn and Valerie Morris for their efforts in designing the laser-based, continuous flow measurement system for water isotopes. This material is based upon work sponsored by the National Science Foundation (NSF) (Grant Nos. 1245947 and 1807478). J.G. was supported by an Omidyar Fellowship at the Santa Fe Institute. The WDC ice core was supported by the U.S. National Science Foundation (NSF) (Grant Nos. 0537593, 0537661, 0537930, 0539232, 1043092, 1043167, 1043518, and 1142166). Associated field and logistical activities were managed by the WAIS Divide Science Coordination Office at the Desert Research Institute, USA, and the University of New Hampshire, USA (NSF Grant Nos. 0230396, 0440817, 0944266, and 0944348). The NSF Division of Polar Programs funded the Ice Drilling Program Office, the Ice Drilling Design and Operations group, the NSF Ice Core Facility (formerly known as the National Ice Core Lab), the Antarctic Support Contractor, and the 109th New York Air National Guard. Water-isotope measurements were performed at the Stable Isotope Lab at the Institute of Arctic and Alpine Research, University of Colorado. These data are archived on the website of the U.S. Antarctic Program Data Center.^{19} Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the NSF.

## REFERENCES

*Advances in Intelligent Data Analysis XV*, edited by H. Boström, A. Knobbe, C. Soares, and P. Papapetrou (Springer International Publishing, 2016), pp. 343–355.

*et al.*,

*EGU General Assembly Conference Abstracts*(EGU, 2014), Vol. 16, p. 17028.