The probability distributions of large-scale atmospheric and oceanic variables are generally skewed and heavy-tailed. We argue that their distinctive departures from Gaussianity arise fundamentally from the fact that in a quadratically nonlinear system with a quadratic invariant, the coupling coefficients between system components are not constant but depend linearly on the system state in a distinctive way. In particular, the skewness arises from a tendency of the system trajectory to linger near states of weak coupling. We show that the salient features of the observed non-Gaussianity can be captured in the simplest such nonlinear 2-component system. If the system is stochastically forced and linearly damped, with one component damped much more strongly than the other, then the strongly damped fast component becomes effectively decoupled from the weakly damped slow component, and its impact on the slow component can be approximated as a stochastic noise forcing plus an augmented nonlinear damping. In the limit of large time-scale separation, the nonlinear augmentation of the damping becomes small, and the noise forcing can be approximated as an additive noise plus a correlated additive and multiplicative noise (CAM noise) forcing. Much of the diversity of observed large-scale atmospheric and oceanic probability distributions can be interpreted in this minimal framework.
Quadratically coupled systems with both fast and slow components often exhibit skewness in the probability distribution of the slow variable. We show both theoretically and numerically in a two-dimensional system with a quadratic invariant that this occurs when the timescale separation between the systems causes the fast variable to affect the slower variable as correlated additive and multiplicative (CAM) stochastic forcing.
It is increasingly evident that the probability distributions of large-scale atmospheric and oceanic variables are not only non-Gaussian but also non-Gaussian in a distinctive way. Figure 1 illustrates this for absolute vorticity, an important atmospheric variable at the dynamically active jet stream level, and for sea surface temperature (SST), an important oceanic variable at the air-sea interface. Specifically, it shows the skewness S and kurtosis K of the daily anomalies of these variables from their long-term averages in winter, defined as
where , is the variance, and angle brackets refer to long-term averages. S is clearly not zero, and K is generally positive as evident from the scatter plots of K versus S. A positive K is a measure of the heaviness of the tails of a probability distribution relative to those of a Gaussian distribution. Figure 1 also indicates a distinctive parabolic K-S relationship, , for these variables.
Sardeshmukh and Sura1 (SS09 hereafter) argued that these and other distinctively non-Gaussian features of the observed probability density functions (PDFs) could be understood by considering the following 1-D linear Markov process model of the anomalies:
where A < 0, b, E, and g are constants, and η1 and η2 are independent Stratonovic Gaussian white noises, with ⟨η1η2⟩ = 0. Note that ηi is related to a standard Brownian motion Wi as dWi = ηi dt. The bη2 term represents additive stochastic noise forcing. SS09 referred to (g + Ex′)η1 as CAM noise forcing. The skewness of x′ arises from the asymmetric magnitude of this term with respect to the sign of x′. The CAM noise forcing also drives a mean response, which needs to be negated by an additional forcing −Eg/2 to ensure that ⟨x′⟩ = 0. Note that if E = 0, Eq. (2) represents a dynamical model of x′ as a standard linear Ornstein-Uhlenbeck “red noise” process3 with a Gaussian stationary PDF.
Defining , which must be positive if the statistics of x′ are stationary, and , which must be less than 1/3 if the kurtosis K exists, the first four statistical moments of this CAM-noise driven process may be derived, using the Fokker-Planck equation associated with (2), as
An explicit expression for the stationary PDF of this stochastically generated skewed (“SGS”) process may also be derived using the Fokker-Planck equation as
where is a normalization constant
which ensures that p(x) integrates to unity. Here, ν = λ/E2, q = 2gν/b, and Γ(z) is the standard Gamma function of complex argument z. This explicit expression reveals the heavy tails of the distribution to be approximately power-law tails. For |Ex′ + g| ≫ b, p(x′) decays with x′ as
where the +(−) sign refers to positive (negative) tails. SS09 provided examples of such approximate power-law tails in the observed PDFs of upper-tropospheric vorticity and mid-tropospheric geopotential height anomalies. Note that the inference of such tails from (2) does not require even the variance to exist. This prompted Penland and Sardeshmukh (2012)4 to propose an alternative interpretation of power-law distributions encountered in many natural contexts as possibly arising from SGS processes, instead of Lévy processes as is often assumed.
For future reference, we note that the time-lag autocorrelation function of x′ in (2) can be derived using the Fokker-Planck equation for the conditional probability of given as . The power spectrum of x′ can then be derived using the Wiener-Khinchin theorem as
whose form is identical to that of a standard O-U “red noise” process.
This single-component anomaly model was discussed extensively in SS09 and in Ref. 4. Its success in capturing the essence of the non-Gaussianity of observed probability distributions is evident. Why this should be so in the obviously nonlinear multi-component climate system is, however, still not completely clear.
We provide a rationale here for (2) as an approximate dynamical equation for the slow components of energy-conserving quadratically nonlinear dynamical systems, in which the fast components are driven by additive stochastic forcing and are strongly damped. We argue that even without forcing and dissipation, the statistics of such a system are generally skewed because the coupling coefficients between the system components are not constant but depend linearly on the system state in a distinctive way. In particular, the skewness arises from the system trajectory lingering near states of weak coupling. We show that the salient features of the observed non-Gaussianity can be captured in the simplest such nonlinear 2-component system. If the system is stochastically forced and linearly damped, with one component damped much more strongly than the other, then the strongly damped fast component becomes effectively decoupled from the weakly damped slow component, and its impact on the slow component can be approximated as a stochastic noise forcing plus an augmented nonlinear damping. In the limit of large time-scale separation, the nonlinear augmentation of the damping becomes small, and the noise forcing can be approximated as an additive noise plus a CAM noise forcing, leading to Eq. (2) for the slow component.
II. A SIMPLE NONLINEAR OSCILLATOR
Consider the 2-component system
where and are damping constants, B, C, and D are coupling constants, and represents stochastic Gaussian white noise forcing of y with amplitude . The “energy” of this system evolves as
and is conserved in the absence of forcing and dissipation. Note that if there is neither forcing nor dissipation, any other quadratically nonlinear 2-component system with a quadratic invariant can be expressed in the form (7) through a linear transformation of system variables.
Figure 2 shows the phase portrait of this system in the absence of forcing and dissipation. For a linear system, C and D are both zero, and starting from any initial condition the system stays in a circular orbit around the origin at a constant angular velocity B. This symmetry is broken for a nonlinear system, in which C and/or D are not zero, by the presence of a line of zero coupling, = = 0, located at a distance from the origin. The line is a line of continuous stable and unstable fixed points separated by a neutral fixed point. In this nonlinear scenario, initial states that are closer than d to the origin follow circular orbits as before, but with variable angular velocities that are slower than B near the line and faster than B away from it. Initial states that are farther than d also follow circular trajectories with variable angular velocities, but these eventually intersect with the line and then stay there.
If the system is forced and damped, the line of zero coupling ceases to be a line of fixed points, but its orientation and distance from the origin remain the primary determinants of the sign and magnitude of the skewness of x (or y). If, for example, D is zero but C is not, the line is vertical (parallel to the y axis) and introduces an obvious asymmetry in the dynamics of x. If, on the other hand, C is zero but D is not, the line is horizontal (parallel to the x axis) and there is no obvious asymmetry in the dynamics of x. However, the damping of x introduces an asymmetry even in this case by accelerating the trajectory near the line for positive x and decelerating it for negative x. In general, the sign of BC determines the sign of the skewness of x in the former case, and the sign of BD determines it in the latter. In the most general case, of course, neither C nor D is zero, as illustrated in Figure 2.
We are especially interested here in understanding the dynamics of this system if one component is damped much more strongly than the other. It turns out that the dynamics of the weakly damped component can then be approximated as (2), as we show both theoretically and numerically below.
III. THE APPROXIMATE DYNAMICS OF THE STOCHASTICALLY FORCED AND DAMPED NONLINEAR OSCILLATOR
We designate x as the weakly damped slow variable in (7), and y as the strongly damped fast variable, with . We show here that if , the equation for the anomalies of x from its mean can be approximated as (2).
If , we may write , where represents a part of y that evolves in quasi-equilibrium with x. We may then write equations for the slow and fast parts of y as
where . If we now also assume that , then on the slow timescales of x we can approximate ys first as red noise with mean and variance
and then this red noise as proportional to a white noise
which allows (10a) to be approximated as
where indicates a deviation of from its mean . On timescales of x, we may approximate as red noise, with a damping rate of , and then as proportional to a white noise
This expression leads, to first order in , to
If we write and , then retaining terms to in (15) gives
Justification for these specified dependences of S and D on λy is provided below. We note in passing that if S in (15) is instead specified as (as in, e.g., Ref. 5), then retaining terms to in (15) leads not only to the augmented damping terms being dropped from (16) but also the terms as well, leaving the CAM noise forcing as the only link between and for large but finite timescale separation. Note that it too vanishes in the limit of infinite timescale separation , leading to a “no noise forcing limit”10,11 instead of the “fast noise forcing limit” considered here. The variance of x vanishes in such a no noise limit, but approaches a constant in the fast noise limit. Numerical simulations of (7) performed with both the no noise limit (not shown) and the fast noise limit (shown below) verify these limits.
We now write in (16) as a mean plus an anomaly. The equation for the mean is
which is identical to (2), with
Note again that this approximation requires both and to be much smaller than unity.
IV. NUMERICAL CALCULATIONS
We confirm the dynamics of the nonlinear oscillator in the limit of derived above through numerical integrations of (7), using specified values of , and , , , , and . Four distinct cases, with different combinations of C and D, were investigated
Integrations in each case were performed for = 1, 2, 10, 100, and 300, using a Runge-Kutta-4 method6,7 with a time step . Each integration was sampled every 100 time steps, with at least 5 × 106 samples recorded.
Figure 3 shows the stationary PDFs of the standardized anomalies obtained in each case, together with the theoretical predictions for = 300. Note from Eqs. (19) and (20) that the limiting PDFs for case B and case BC are independent of . Case B represents the simplest case of a damped linear oscillator driven by additive stochastic noise forcing, and the PDFs are Gaussian as expected. The PDFs are skewed in case BC and case BCD, and are well matched by the theoretical predictions for large given by (4) and (20). Note that the limiting PDF of x in case BD is a non-centered Gaussian, as verified in Figures 3 and 5. Note also that the theoretical PDF in case BC (for which b is exactly zero in (20) and (2)) is somewhat different from (4), and is given by
with the normalization constant
In this case, x′ is limited on one side by −g/E, where there is a reflecting boundary condition.
Figure 4 highlights, for the most general case BCD with , the heaviness of the tails predicted by (4) relative to Gaussian tails. The numerical result is in excellent agreement. For this case, the theory predicts the negative tail to be lighter than a Gaussian tail for , but heavier again for larger negative values and decaying with the same power-law exponent as the positive tail. Our numerical integrations for this case were long enough to accurately estimate the probability densities to nearly four orders of magnitude smaller than the peak of the PDF, but not long enough to confirm the prediction in Figure 4(b) of the vanishingly small probabilities of extremely large negative values.
Figure 5 shows the mean, standard deviation, skewness S, and kurtosis K as functions of in the four cases. All four statistical moments appear to converge to the values predicted using (3) and (20). The convergence is rapid in the linear case B, consistent with traditional expectations,8–10 but much slower in the nonlinear cases BC, BD, and BCD, requiring to be at least as large as 100 for the asymptotic dynamics (19) to represent a valid approximation.
The spectra of x are shown in Figure 6. They approach a red spectrum in all four cases for large , with an decay at high frequencies as predicted by (6). Interestingly, the decay is steeper for moderate values of , even in the linear case B. In fact, this property is easy to show analytically for case B.
Figure 7 shows the corresponding spectra of the fast variable y in the four cases, in an identical format to Figure 6. In all cases, they approach red noise spectra for large as expected, and importantly, with the same white low-frequency tail. It is this that justifies approximating the red noise ys as a white noise stochastic forcing of x in (12), with specified as to ensure an asymptotic value of the white noise spectral density as . The variance of ys then does increase linearly with as increases, but that of x does not, as shown in Figure 5, in essence because x responds mainly to the forcing associated with the “coarse-grained” low-frequency part of y.
The larger variance of ys obtained for larger values of is consistent with the “white noise limit,”10,11 which requires that the product of this variance and remain constant as the limit is taken. This requirement is what determined our specified dependence of on λy, which yields the correct approximate dynamics (19) for x in (7) for large but finite by taking the limit . In contrast, specifying as in the “no noise limit” formulation mentioned earlier, which requires the variance of ys to remain constant as is increased, yields a vanishing x in the limit . One way around this,5 if is finite, is not to take the formal limit in going from (15) to (16), but to retain the stochastic forcing as to . This yields an approximate stochastically forced dynamics of x equivalent to ours in the B and BC cases (without any augmented damping), but an incorrect approximate dynamics in the BD and the most general BCD cases.
Note that the approximation made in (12) and (14) is not only that for large both and are white at low temporal frequencies but also that their PDFs are approximately Gaussian at those frequencies. Figure 8 shows the PDFs of the standardized values of y′ and (y2)′, which are dominated by those of that and , for . The PDFs of y′ (Figure 8(a)) are indistinguishable from a standard Gaussian in all four cases. When (y2) is averaged over half the decay time of the slow variable, i.e., 1/(2λx), its PDFs are also nearly Gaussian (Figure 8(b)), particularly relative to the chi-squared distribution with one degree of freedom (not shown) of the unaveraged values of (y2)′.
A further approximation made in (12) and (14) is that and are not just uncorrelated but are independent at low temporal frequencies, i.e., η1 and η2 are independent Gaussian white noises in (19) and (2). This approximation is also easily validated. Although the unaveraged values of y2 are obviously dependent on those of y, the time-averaged “coarse-grained” values are not. To confirm this, we estimated the mutual information I (y′, z′), with z = y2, defined as
where , , and refer to the marginal PDFs of y′ and z′ and their joint PDF, respectively. We estimated these PDFs using the joint histogram of y′ and z′, dividing each of their ranges into M equal sized bins of standardized values. Such a simple estimation of I(y′,z′) is not optimal and depends on the choice of M. We performed the calculations for three values M = 10, 20, and 30. Figure 9 shows the average of the three estimations of I(y′, z′) as a function of the time averaging interval in units of 1/λx, with confidence intervals indicated as the difference between the maximum and minimum estimated values. For comparison, we also show the theoretical maximum attainable value of I(y′, y2′) if y2′ is completely determined by y′; that is, we show the entropy of a standard chi-squared distribution with one degree of freedom. It is clear from Figure 9 that coarse-graining diminishes the mutual information between and , allowing us to approximate η1 and η2 are independent quantities in (19) and (2).
The non-Gaussianity of the probability distributions of large-scale atmospheric and oceanic variables arises fundamentally from the fact that in the predominantly quadratically nonlinear climate system, the coupling coefficients between system components are not constant but depend linearly on the system state. The skewness arises from a tendency of the system to linger near states of weak coupling, generally located asymmetrically with respect to the mean state in state space. We have shown that the distinctively skewed and heavy-tailed character of the observed probability distributions can be captured in the simplest such 2-component nonlinear system, in which one component is stochastically forced and damped much more strongly than the other. In this limit, it becomes effectively decoupled from the weakly damped slow component, and its impact on the slow component can be approximated as an additive noise plus a CAM noise stochastic forcing. The CAM noise forcing is responsible for both the skewness and heavy tails of the distributions.
Although we have rigorously justified the CAM noise approximation here only in the context of the 2-component system (7), we suggest that much of the diversity of observed large-scale atmospheric and oceanic probability distributions can be interpreted in this minimal framework by identifying the weakly damped slow component with large-scale variables and the strongly damped fast component with small-scale variables. The CAM noise forcing may thus be interpreted as representing the impact of all slow-fast interactions on the slow variables, and the additional additive noise forcing as representing the impact of all fast-fast interactions. We have theoretically shown, and numerically confirmed, that in the limit of large time-scale separation the CAM noise forcing associated with a non-zero C in (7) is the primary determinant of the skewness and heavy tails of the PDF of the slow variable x. The additive noise forcing, associated with a non-zero D in (7), also affects the mean and variance of x, but neither its skewness nor its heavy tails in this limit. A similar remark applies to additional external additive noise forcing of x, consistent with traditional expectations,8 which we did not include in (7) for simplicity, given our focus on non-Gaussianity generated by internal system dynamics.
We should stress that our goal here was to seek a rigorous justification, in a simple context, of the limiting form (2) for the dynamics of individual system components of quadratically nonlinear systems for the purposes of modeling their stationary statistics. While there is evidence that (2) is in general a good model for deducing the stationary statistics of large-scale atmospheric and oceanic variables, one obviously does not expect it to be also a good model for forecasting those variables, not least because the one-dimensional linear dynamics of the asymptotic system cannot represent oscillatory features in the temporal lag-autocorrelation functions. Multi-component linear stochastically forced models, such as those used in Refs. 12–14 for representing atmospheric circulation variability and in Refs. 15 and 16 for representing tropical SST variability, can represent such oscillatory features, but collapsing their dynamics into the single-component form (2) for our purposes here apparently requires additional a priori knowledge about the underlying nonlinear system.
Also, although we did so for the 2-component system in (20), we have not addressed the important issue of what determines the values of the parameters in (2) in general. The climate system is obviously not a 2-component system, and adiabatic and inviscid atmospheric and oceanic circulations also conserve potential enstrophy and not just energy, whereas a 2-component model can have only one quadratic invariant. The dual constraints of energy and potential enstrophy conservation impose strong constraints on the decay of energy and enstrophy wavenumber spectra. This implies, in the context of our 2-component model, that the parameter D should be some decaying function of , which we specified as as a minimum decay requirement to prevent the unphysical “ultraviolet catastrophe” of generating infinite large-scale variance by stochastic backscatter from infinitely small scales.
The role of the D terms in (7) is subtle, but has important implications for representing the effects of fast variables on slow variables through “stochastic parameterizations” in simple17–20 as well as comprehensive21–24,31 nonlinear models. On the one hand, the only terms that can be represented as a pure additive noise forcing in such parameterizations of quadratic nonlinearities are the fast-fast interaction terms involving D, as in (20). On the other hand, a non-zero D can undermine the stochastic parameterization agenda itself by requiring not only but also to be much smaller than 1 in (10b). Even if the former condition of adequate time scale separation is satisfied, one can imagine the latter condition being violated for moderately large values of D and x.
We end by noting that although (2) is highly relevant for quantitatively understanding the non-Gaussianity of the probability distributions of large-scale atmospheric and oceanic variables, it cannot explain the non-Gaussianity associated with nonlinear slow-slow interactions, as amply demonstrated by the substantial deviation of our of 2-d model results for moderate from those for large . Even in such cases, however, one can gain a basic understanding of the sign and magnitude of the skewness by considering the orientation and location of the line (or an appropriate hyper-plane) of zero coupling as in Figure 2. It is also noteworthy in this context that multi-component empirical linear stochastically forced models that ignore the slow-slow interactions have comparable skill to that of comprehensive nonlinear numerical models in representing and predicting large-scale atmospheric and oceanic variations.1,25–29 This suggests that the slow-slow interactions are of relatively minor importance in the dynamics of many large-scale weather and climate variations. If they were important, one would expect them to have a strong impact not only on the non-Gaussianity but also on the prediction skill of the variations.
This work was partly supported by the U.S. Department of Energy (DOE) Office for Science (BER) under Grant No. DE-SC0006965. We also gratefully acknowledge the comments of two anonymous reviewers, whose remarks significantly improved the article.