The probability distributions of large-scale atmospheric and oceanic variables are generally skewed and heavy-tailed. We argue that their distinctive departures from Gaussianity arise fundamentally from the fact that in a quadratically nonlinear system with a quadratic invariant, the coupling coefficients between system components are not constant but depend linearly on the system state in a distinctive way. In particular, the skewness arises from a tendency of the system trajectory to linger near states of weak coupling. We show that the salient features of the observed non-Gaussianity can be captured in the simplest such nonlinear 2-component system. If the system is stochastically forced and linearly damped, with one component damped much more strongly than the other, then the strongly damped fast component becomes effectively decoupled from the weakly damped slow component, and its impact on the slow component can be approximated as a stochastic noise forcing plus an augmented nonlinear damping. In the limit of large time-scale separation, the nonlinear augmentation of the damping becomes small, and the noise forcing can be approximated as an additive noise plus a correlated additive and multiplicative noise (CAM noise) forcing. Much of the diversity of observed large-scale atmospheric and oceanic probability distributions can be interpreted in this minimal framework.

Quadratically coupled systems with both fast and slow components often exhibit skewness in the probability distribution of the slow variable. We show both theoretically and numerically in a two-dimensional system with a quadratic invariant that this occurs when the timescale separation between the systems causes the fast variable to affect the slower variable as correlated additive and multiplicative (CAM) stochastic forcing.

It is increasingly evident that the probability distributions of large-scale atmospheric and oceanic variables are not only non-Gaussian but also non-Gaussian in a distinctive way. Figure 1 illustrates this for absolute vorticity, an important atmospheric variable at the dynamically active jet stream level, and for sea surface temperature (SST), an important oceanic variable at the air-sea interface. Specifically, it shows the skewness S and kurtosis K of the daily anomalies x=xx of these variables from their long-term averages x in winter, defined as

S=x3/σ3,K=x4/σ43,
(1)

where x=0, σ2=x2 is the variance, and angle brackets refer to long-term averages. S is clearly not zero, and K is generally positive as evident from the scatter plots of K versus S. A positive K is a measure of the heaviness of the tails of a probability distribution relative to those of a Gaussian distribution. Figure 1 also indicates a distinctive parabolic K-S relationship, K>(3/2)S2, for these variables.

FIG. 1.

Left panels: Skewness S of daily-averaged 300 hPa vorticity anomalies, and of daily-averaged SST anomalies, in northern winter (DJF). Right panels: Scatterplot of Excess Kurtosis K versus S from the left panel at all northern hemispheric gridpoints (upper panel) and all oceanic gridpoints (lower panel). Adapted from Refs. 1 and 2.

FIG. 1.

Left panels: Skewness S of daily-averaged 300 hPa vorticity anomalies, and of daily-averaged SST anomalies, in northern winter (DJF). Right panels: Scatterplot of Excess Kurtosis K versus S from the left panel at all northern hemispheric gridpoints (upper panel) and all oceanic gridpoints (lower panel). Adapted from Refs. 1 and 2.

Close modal

Sardeshmukh and Sura1 (SS09 hereafter) argued that these and other distinctively non-Gaussian features of the observed probability density functions (PDFs) could be understood by considering the following 1-D linear Markov process model of the anomalies:

dxdt=Ax+(g+Ex)η1+bη212Eg,
(2)

where A<0, b, E, and g are constants, and η1 and η2 are independent Stratonovic Gaussian white noises, with ⟨η1η2⟩ = 0. Note that ηi is related to a standard Brownian motion Wi as dWi = ηi dt. The 2 term represents additive stochastic noise forcing. SS09 referred to (g + Ex′)η1 as CAM noise forcing. The skewness of x′ arises from the asymmetric magnitude of this term with respect to the sign of x′. The CAM noise forcing also drives a mean response, which needs to be negated by an additional forcing −Eg/2 to ensure that ⟨x′⟩ = 0. Note that if E = 0, Eq. (2) represents a dynamical model of x′ as a standard linear Ornstein-Uhlenbeck “red noise” process3 with a Gaussian stationary PDF.

Defining λ=[A+(1/2)E2], which must be positive if the statistics of x′ are stationary, and α=0.5E2/λ, which must be less than 1/3 if the kurtosis K exists, the first four statistical moments of this CAM-noise driven process may be derived, using the Fokker-Planck equation associated with (2), as

3
Meanμ= x =0,
(3a)
Varianceσ2= x2 =g2+b22λ(1α),
(3b)
SkewnessS= x3 σ3=2Egσλ(12α),and
(3c)
KurtosisK= x4 σ43=32[ 12α13α ]S2+6α13α>32S2.
(3d)
Expressions (3c) and (3d) show the explicit dependence of S and K on the CAM-noise amplitude parameters E and g. The K-S inequality, K>(3/2)S2, results from the fact that the first term within square brackets on the right hand side of (3d) is greater than 1 and the second term is greater than 0. Note that this is a much stronger constraint on K than the universal lower bound K>S2 − 2 if S exists.30 

An explicit expression for the stationary PDF of this stochastically generated skewed (“SGS”) process may also be derived using the Fokker-Planck equation as

p(x)=1N[ (Ex+g)2+b2 ](1+ν)exp[ 2gνbarctan(Ex+gb) ],
(4)

where N is a normalization constant

N=2πE(2b)(2ν+1)Γ(2ν+1)Γ(ν+1iq/2)Γ(ν+1+iq/2),

which ensures that p(x) integrates to unity. Here, ν=λ/E2, q=2gν/b, and Γ(z) is the standard Gamma function of complex argument z. This explicit expression reveals the heavy tails of the distribution to be approximately power-law tails. For |Ex′ + g| ≫ b, p(x′) decays with x′ as

p±(x)1N[Ex+g]2(1+ν)exp(±πgνb),
(5)

where the +(−) sign refers to positive (negative) tails. SS09 provided examples of such approximate power-law tails in the observed PDFs of upper-tropospheric vorticity and mid-tropospheric geopotential height anomalies. Note that the inference of such tails from (2) does not require even the variance to exist. This prompted Penland and Sardeshmukh (2012)4 to propose an alternative interpretation of power-law distributions encountered in many natural contexts as possibly arising from SGS processes, instead of Lévy processes as is often assumed.

For future reference, we note that the time-lag autocorrelation function ρ(τ) of x′ in (2) can be derived using the Fokker-Planck equation for the conditional probability of x(t+τ) given x(t) as ρ(τ)= x(t+τ)x(t) /σ2=eλτ. The power spectrum of x′ can then be derived using the Wiener-Khinchin theorem as

Px(f)=2λσ2λ2+4π2f2,
(6)

whose form is identical to that of a standard O-U “red noise” process.

This single-component anomaly model was discussed extensively in SS09 and in Ref. 4. Its success in capturing the essence of the non-Gaussianity of observed probability distributions is evident. Why this should be so in the obviously nonlinear multi-component climate system is, however, still not completely clear.

We provide a rationale here for (2) as an approximate dynamical equation for the slow components of energy-conserving quadratically nonlinear dynamical systems, in which the fast components are driven by additive stochastic forcing and are strongly damped. We argue that even without forcing and dissipation, the statistics of such a system are generally skewed because the coupling coefficients between the system components are not constant but depend linearly on the system state in a distinctive way. In particular, the skewness arises from the system trajectory lingering near states of weak coupling. We show that the salient features of the observed non-Gaussianity can be captured in the simplest such nonlinear 2-component system. If the system is stochastically forced and linearly damped, with one component damped much more strongly than the other, then the strongly damped fast component becomes effectively decoupled from the weakly damped slow component, and its impact on the slow component can be approximated as a stochastic noise forcing plus an augmented nonlinear damping. In the limit of large time-scale separation, the nonlinear augmentation of the damping becomes small, and the noise forcing can be approximated as an additive noise plus a CAM noise forcing, leading to Eq. (2) for the slow component.

Consider the 2-component system

dxdt=λxx+(B+Cx+Dy)y,dydt=λyy(B+Cx+Dy)x+Sη,
(7)

where λx and λy are damping constants, B, C, and D are coupling constants, and Sη represents stochastic Gaussian white noise forcing of y with amplitude S. The “energy” (x2+y2)/2 of this system evolves as

ddt(x2+y22)=λxx2λyy2+Sηy,
(8)

and is conserved in the absence of forcing and dissipation. Note that if there is neither forcing nor dissipation, any other quadratically nonlinear 2-component system with a quadratic invariant can be expressed in the form (7) through a linear transformation of system variables.

Figure 2 shows the phase portrait of this system in the absence of forcing and dissipation. For a linear system, C and D are both zero, and starting from any initial condition the system stays in a circular orbit around the origin at a constant angular velocity B. This symmetry is broken for a nonlinear system, in which C and/or D are not zero, by the presence of a line of zero coupling, Ω(x,y) = B+Cx+Dy = 0, located at a distance d=B/C2+D2 from the origin. The line is a line of continuous stable and unstable fixed points separated by a neutral fixed point. In this nonlinear scenario, initial states that are closer than d to the origin follow circular orbits as before, but with variable angular velocities that are slower than B near the line and faster than B away from it. Initial states that are farther than d also follow circular trajectories with variable angular velocities, but these eventually intersect with the line and then stay there.

FIG. 2.

Left panel: Phase portrait of the energy-conserving 2-component nonlinear oscillator. The straight line is a line of fixed points with zero coupling, Ω(x,y) = B+Cx+Dy = 0. The stable fixed points (solid) are separated from the unstable fixed points (dashed) by a neutral fixed point (small gray dot) located at a distance d = B/C2+D2 from the origin. Initial states at smaller distances from the origin follow circular trajectories about the origin that slow down in the vicinity of this neutral point. Initial conditions farther from the origin end up on the line of zero coupling, and then stay there. The line's existence introduces a fundamental asymmetry in the statistics of the system with respect to the origin. An invertible linear transformation of the state variables, illustrated in the right panel, does not affect this fundamental behavior.

FIG. 2.

Left panel: Phase portrait of the energy-conserving 2-component nonlinear oscillator. The straight line is a line of fixed points with zero coupling, Ω(x,y) = B+Cx+Dy = 0. The stable fixed points (solid) are separated from the unstable fixed points (dashed) by a neutral fixed point (small gray dot) located at a distance d = B/C2+D2 from the origin. Initial states at smaller distances from the origin follow circular trajectories about the origin that slow down in the vicinity of this neutral point. Initial conditions farther from the origin end up on the line of zero coupling, and then stay there. The line's existence introduces a fundamental asymmetry in the statistics of the system with respect to the origin. An invertible linear transformation of the state variables, illustrated in the right panel, does not affect this fundamental behavior.

Close modal

If the system is forced and damped, the line of zero coupling ceases to be a line of fixed points, but its orientation and distance from the origin remain the primary determinants of the sign and magnitude of the skewness of x (or y). If, for example, D is zero but C is not, the line is vertical (parallel to the y axis) and introduces an obvious asymmetry in the dynamics of x. If, on the other hand, C is zero but D is not, the line is horizontal (parallel to the x axis) and there is no obvious asymmetry in the dynamics of x. However, the damping of x introduces an asymmetry even in this case by accelerating the trajectory near the line for positive x and decelerating it for negative x. In general, the sign of BC determines the sign of the skewness of x in the former case, and the sign of BD determines it in the latter. In the most general case, of course, neither C nor D is zero, as illustrated in Figure 2.

We are especially interested here in understanding the dynamics of this system if one component is damped much more strongly than the other. It turns out that the dynamics of the weakly damped component can then be approximated as (2), as we show both theoretically and numerically below.

We designate x as the weakly damped slow variable in (7), and y as the strongly damped fast variable, with λy>λx. We show here that if λyλx, the equation for the anomalies x=xx of x from its mean x can be approximated as (2).

If λyλx, we may write y=ŷ+ys, where ŷ represents a part of y that evolves in quasi-equilibrium with x. We may then write equations for the slow and fast parts of y as

dŷdt=λyŷ(B+Cx+Dŷ)x=0,and
(9a)
dysdt=λyysDysx+Sη.
(9b)

Solving for ŷ in (9a) and substituting in (7) then gives

dxdt=λxx1λy[ B+Cx1+γx ]2x+(B+Cx)[ 1γx1+γx ]ys+Dys2,
(10a)
dysdt=λy(1+γx)ys+Sη,
(10b)

where γ=D/λy. If we now also assume that γx1, then on the slow timescales of x we can approximate ys first as red noise with mean and variance

ys =0and ys2 =S22λy(1+γx)S22λy(1γx),
(11)

and then this red noise as proportional to a white noise η1

ysSλy(1γx)η1,
(12)

which allows (10a) to be approximated as

dxdt=[ λx+(B+Cx)2λy ]x+(B+Cx)(12γx)ys+D(ys2)+D ys2 =[ λx+(B+Cx)2λy+D2S22λy2 ]x+(B+Cx)(12γx)ys+D(ys2)+DS22λy,
(13)

where (ys2) indicates a deviation of ys2 from its mean ys2 . On timescales 1/λx of x, we may approximate (ys2) as red noise, with a damping rate of 2λy, and then as proportional to a white noise η2

(ys2)S2λy2λy2(132γx)η2.
(14)

This expression leads, to first order in γx, to

dxdt=[ λx+(B+Cx)2λy+D2S22λy2 ]x+(B+Cx)(13γx)Sλyη1+DS2λy2λy2(132γx)η2+DS22λy.
(15)

If we write S=S0λy and D=D0/λy, then retaining terms to O(1/λy1/2) in (15) gives

dxdt=λxx+(B+Cx)S0η1+D0S022λyη2+D0S022.
(16)

Justification for these specified dependences of S and D on λy is provided below. We note in passing that if S in (15) is instead specified as S=S1λy (as in, e.g., Ref. 5), then retaining terms to O(1/λy1/2) in (15) leads not only to the augmented damping terms being dropped from (16) but also the D0 terms as well, leaving the CAM noise forcing (B+Cx)S1λy1/2η1 as the only link between x and y for large but finite timescale separation. Note that it too vanishes in the limit of infinite timescale separation λy, leading to a “no noise forcing limit”10,11 instead of the “fast noise forcing limit” considered here. The variance of x vanishes in such a no noise limit, but approaches a constant in the fast noise limit. Numerical simulations of (7) performed with both the no noise limit (not shown) and the fast noise limit (shown below) verify these limits.

We now write x= x +x in (16) as a mean plus an anomaly. The equation for the mean is

d x dt=λx x +12(B+C x )CS02+D0S022=0,
(17)

which gives

x =(BC+D0)S02/2λx(1/2)C2S02,
(18)

and subtracting (17) from (16) finally yields an equation for the anomalies

dxdt=λxx+(B+C x +Cx)S0η1+D0S022λyη212(B+C x )CS02,
(19)

which is identical to (2), with

A=λx,E=CS0,g=(B+C x )S0,andb=D0S022λy.
(20)

Note again that this approximation requires both λx/λy and γx=Dx/λy to be much smaller than unity.

We confirm the dynamics of the nonlinear oscillator in the limit of λyλx derived above through numerical integrations of (7), using specified values of λy, and λx=0.1, B=0.1, C=0.05, D=D0/λy=0.05/λy, and S=S0λy=2λy. Four distinct cases, with different combinations of C and D, were investigated

CaseB:B0,C=0,D=0,
CaseBC:B0,C0,D=0,
CaseBD:B0,C=0,D0,
CaseBCD:B0,C0,D0.

Integrations in each case were performed for λy/λx = 1, 2, 10, 100, and 300, using a Runge-Kutta-4 method6,7 with a time step Δt=1/(500λy). Each integration was sampled every 100 time steps, with at least 5 × 106 samples recorded.

Figure 3 shows the stationary PDFs of the standardized anomalies x/σx=(x x )/σx obtained in each case, together with the theoretical predictions for λy/λx = 300. Note from Eqs. (19) and (20) that the limiting PDFs for case B and case BC are independent of λy/λx. Case B represents the simplest case of a damped linear oscillator driven by additive stochastic noise forcing, and the PDFs are Gaussian as expected. The PDFs are skewed in case BC and case BCD, and are well matched by the theoretical predictions for large λy/λx given by (4) and (20). Note that the limiting PDF of x in case BD is a non-centered Gaussian, as verified in Figures 3 and 5. Note also that the theoretical PDF in case BC (for which b is exactly zero in (20) and (2)) is somewhat different from (4), and is given by

p(x)=1N[ Ex+g ]2(1+ν)exp[ 2νgEx+g ],
(21)

with the normalization constant

N=1E(2νg)(2ν+1)Γ(2ν+1).

In this case, x′ is limited on one side by −g/E, where there is a reflecting boundary condition.

FIG. 3.

PDF of standardized x′ = (x − ⟨x⟩) (i.e., in units of σx). (a) Case B, (b) case BC, (c) case BD, and (d) case BCD. In each panel, the results for λyx = 1, 2, 10, 100, and 300 are indicated by the solid brown, dotted purple, solid blue, dotted green, and solid red colored lines, respectively. The heavy black line is the theoretical prediction for λyx = 300.

FIG. 3.

PDF of standardized x′ = (x − ⟨x⟩) (i.e., in units of σx). (a) Case B, (b) case BC, (c) case BD, and (d) case BCD. In each panel, the results for λyx = 1, 2, 10, 100, and 300 are indicated by the solid brown, dotted purple, solid blue, dotted green, and solid red colored lines, respectively. The heavy black line is the theoretical prediction for λyx = 300.

Close modal
FIG. 5.

Moments as a function of λyx. (a) Mean, (b) standard deviation, (c) skewness S, and (d) excess kurtosis K, shown for case B (red), case BC (blue), case BD (purple), and case BCD (green). Theoretical values for λyx = 300 are shown on right-hand axis.

FIG. 5.

Moments as a function of λyx. (a) Mean, (b) standard deviation, (c) skewness S, and (d) excess kurtosis K, shown for case B (red), case BC (blue), case BD (purple), and case BCD (green). Theoretical values for λyx = 300 are shown on right-hand axis.

Close modal

Figure 4 highlights, for the most general case BCD with λy/λx=300, the heaviness of the tails predicted by (4) relative to Gaussian tails. The numerical result is in excellent agreement. For this case, the theory predicts the negative tail to be lighter than a Gaussian tail for 40<x/σx<3, but heavier again for larger negative values and decaying with the same power-law exponent as the positive tail. Our numerical integrations for this case were long enough to accurately estimate the probability densities to nearly four orders of magnitude smaller than the peak of the PDF, but not long enough to confirm the prediction in Figure 4(b) of the vanishingly small probabilities of extremely large negative values.

FIG. 4.

(a) Comparison of numerical (red) and theoretical (grey) PDFs of standardized x′ with the normal distribution (blue) for case BCD with λy/λx=300. (b) The distinctive contrast of the heavy tails with CAM noise and the tail of the normal distribution.

FIG. 4.

(a) Comparison of numerical (red) and theoretical (grey) PDFs of standardized x′ with the normal distribution (blue) for case BCD with λy/λx=300. (b) The distinctive contrast of the heavy tails with CAM noise and the tail of the normal distribution.

Close modal

Figure 5 shows the mean, standard deviation, skewness S, and kurtosis K as functions of λy/λx in the four cases. All four statistical moments appear to converge to the values predicted using (3) and (20). The convergence is rapid in the linear case B, consistent with traditional expectations,8–10 but much slower in the nonlinear cases BC, BD, and BCD, requiring λy/λx to be at least as large as 100 for the asymptotic dynamics (19) to represent a valid approximation.

The spectra of x are shown in Figure 6. They approach a red spectrum in all four cases for large λy/λx, with an f2 decay at high frequencies as predicted by (6). Interestingly, the decay is steeper for moderate values of λy/λx, even in the linear case B. In fact, this property is easy to show analytically for case B.

FIG. 6.

Spectra of component x obtained in (a) case B, (b) case BC, (c) case BD, and (d) case BCD. In each panel, the results for λyx = 1, 2, 10, 100, and 300 are indicated by solid brown, dotted purple, solid blue, dotted green, and solid red curves, respectively. The straight black line is f−2.

FIG. 6.

Spectra of component x obtained in (a) case B, (b) case BC, (c) case BD, and (d) case BCD. In each panel, the results for λyx = 1, 2, 10, 100, and 300 are indicated by solid brown, dotted purple, solid blue, dotted green, and solid red curves, respectively. The straight black line is f−2.

Close modal

Figure 7 shows the corresponding spectra of the fast variable y in the four cases, in an identical format to Figure 6. In all cases, they approach red noise spectra for large λy/λx as expected, and importantly, with the same white low-frequency tail. It is this that justifies approximating the red noise ys as a white noise stochastic forcing of x in (12), with S specified as S=S0λy to ensure an asymptotic value of the white noise spectral density as λy/λx. The variance of ys then does increase linearly with λy as λy/λx increases, but that of x does not, as shown in Figure 5, in essence because x responds mainly to the forcing associated with the “coarse-grained” low-frequency part of y.

FIG. 7.

As in Figure 6, but for the spectra of component y. The horizontal solid black line is the theoretical low-frequency intercept.

FIG. 7.

As in Figure 6, but for the spectra of component y. The horizontal solid black line is the theoretical low-frequency intercept.

Close modal

The larger variance of ys obtained for larger values of λy is consistent with the “white noise limit,”10,11 which requires that the product of this variance and λx/λy remain constant as the limit is taken. This requirement is what determined our specified dependence of S=S0λy on λy, which yields the correct approximate dynamics (19) for x in (7) for large but finite λy by taking the limit λy. In contrast, specifying S as S=S1λy in the “no noise limit” formulation mentioned earlier, which requires the variance of ys to remain constant as λy is increased, yields a vanishing x in the limit λy. One way around this,5 if λy is finite, is not to take the formal limit in going from (15) to (16), but to retain the stochastic forcing as (B+Cx)S1λy1/2η1 to O(1/λy1/2). This yields an approximate stochastically forced dynamics of x equivalent to ours in the B and BC cases (without any augmented damping), but an incorrect approximate dynamics in the BD and the most general BCD cases.

Note that the approximation made in (12) and (14) is not only that for large λy/λx both ys and (ys2) are white at low temporal frequencies but also that their PDFs are approximately Gaussian at those frequencies. Figure 8 shows the PDFs of the standardized values of y′ and (y2), which are dominated by those of that ys and (ys2), for λy/λx=300. The PDFs of y′ (Figure 8(a)) are indistinguishable from a standard Gaussian in all four cases. When (y2) is averaged over half the decay time of the slow variable, i.e., 1/(2λx), its PDFs are also nearly Gaussian (Figure 8(b)), particularly relative to the chi-squared distribution with one degree of freedom (not shown) of the unaveraged values of (y2).

FIG. 8.

(a) PDFs of y′ = (y − ⟨y⟩) in units of σy for λyx = 300. The results for case B (dotted red), case BC (solid blue), case BD (dotted purple), and case BCD (solid green) are all indistinguishable from a standard Gaussian. (b) PDFs of y2 averaged over time intervals 0.5/λx for λyx = 300 and λx = 0.1. The values of coarse-grained y2 have been standardized to zero-mean and unit standard deviation. Results are shown for case BC (solid blue), case BD (dotted purple), and case BCD (solid green).

FIG. 8.

(a) PDFs of y′ = (y − ⟨y⟩) in units of σy for λyx = 300. The results for case B (dotted red), case BC (solid blue), case BD (dotted purple), and case BCD (solid green) are all indistinguishable from a standard Gaussian. (b) PDFs of y2 averaged over time intervals 0.5/λx for λyx = 300 and λx = 0.1. The values of coarse-grained y2 have been standardized to zero-mean and unit standard deviation. Results are shown for case BC (solid blue), case BD (dotted purple), and case BCD (solid green).

Close modal

A further approximation made in (12) and (14) is that ys and (ys2) are not just uncorrelated but are independent at low temporal frequencies, i.e., η1 and η2 are independent Gaussian white noises in (19) and (2). This approximation is also easily validated. Although the unaveraged values of y2 are obviously dependent on those of y, the time-averaged “coarse-grained” values are not. To confirm this, we estimated the mutual information I (y′, z′), with z = y2, defined as

I(y,z)=pyz(y,z)lnpyz(y,z)py(y)pz(z)dydz,
(22)

where py, pz, and pyz refer to the marginal PDFs of y′ and z′ and their joint PDF, respectively. We estimated these PDFs using the joint histogram of y′ and z′, dividing each of their ranges into M equal sized bins of standardized values. Such a simple estimation of I(y′,z′) is not optimal and depends on the choice of M. We performed the calculations for three values M = 10, 20, and 30. Figure 9 shows the average of the three estimations of I(y′, z′) as a function of the time averaging interval in units of 1x, with confidence intervals indicated as the difference between the maximum and minimum estimated values. For comparison, we also show the theoretical maximum attainable value of I(y′, y2) if y2 is completely determined by y′; that is, we show the entropy of a standard chi-squared distribution with one degree of freedom. It is clear from Figure 9 that coarse-graining diminishes the mutual information between ys and (ys2), allowing us to approximate η1 and η2 are independent quantities in (19) and (2).

FIG. 9.

Mutual information I(y′, y2) for coarse-grained y′ and y2 (black dots) in case BCD with λyx = 300. The coarse graining is accomplished by averaging the values of y and y2 over the time intervals indicated on the abscissa. The horizontal red line denotes the entropy of a chi-squared distribution with one degree of freedom, which would obtain if y2 were completely determined by, and therefore completely dependent on, y. See text for details.

FIG. 9.

Mutual information I(y′, y2) for coarse-grained y′ and y2 (black dots) in case BCD with λyx = 300. The coarse graining is accomplished by averaging the values of y and y2 over the time intervals indicated on the abscissa. The horizontal red line denotes the entropy of a chi-squared distribution with one degree of freedom, which would obtain if y2 were completely determined by, and therefore completely dependent on, y. See text for details.

Close modal

The non-Gaussianity of the probability distributions of large-scale atmospheric and oceanic variables arises fundamentally from the fact that in the predominantly quadratically nonlinear climate system, the coupling coefficients between system components are not constant but depend linearly on the system state. The skewness arises from a tendency of the system to linger near states of weak coupling, generally located asymmetrically with respect to the mean state in state space. We have shown that the distinctively skewed and heavy-tailed character of the observed probability distributions can be captured in the simplest such 2-component nonlinear system, in which one component is stochastically forced and damped much more strongly than the other. In this limit, it becomes effectively decoupled from the weakly damped slow component, and its impact on the slow component can be approximated as an additive noise plus a CAM noise stochastic forcing. The CAM noise forcing is responsible for both the skewness and heavy tails of the distributions.

Although we have rigorously justified the CAM noise approximation here only in the context of the 2-component system (7), we suggest that much of the diversity of observed large-scale atmospheric and oceanic probability distributions can be interpreted in this minimal framework by identifying the weakly damped slow component with large-scale variables and the strongly damped fast component with small-scale variables. The CAM noise forcing may thus be interpreted as representing the impact of all slow-fast interactions on the slow variables, and the additional additive noise forcing as representing the impact of all fast-fast interactions. We have theoretically shown, and numerically confirmed, that in the limit of large time-scale separation the CAM noise forcing associated with a non-zero C in (7) is the primary determinant of the skewness and heavy tails of the PDF of the slow variable x. The additive noise forcing, associated with a non-zero D in (7), also affects the mean and variance of x, but neither its skewness nor its heavy tails in this limit. A similar remark applies to additional external additive noise forcing of x, consistent with traditional expectations,8 which we did not include in (7) for simplicity, given our focus on non-Gaussianity generated by internal system dynamics.

We should stress that our goal here was to seek a rigorous justification, in a simple context, of the limiting form (2) for the dynamics of individual system components of quadratically nonlinear systems for the purposes of modeling their stationary statistics. While there is evidence that (2) is in general a good model for deducing the stationary statistics of large-scale atmospheric and oceanic variables, one obviously does not expect it to be also a good model for forecasting those variables, not least because the one-dimensional linear dynamics of the asymptotic system cannot represent oscillatory features in the temporal lag-autocorrelation functions. Multi-component linear stochastically forced models, such as those used in Refs. 12–14 for representing atmospheric circulation variability and in Refs. 15 and 16 for representing tropical SST variability, can represent such oscillatory features, but collapsing their dynamics into the single-component form (2) for our purposes here apparently requires additional a priori knowledge about the underlying nonlinear system.

Also, although we did so for the 2-component system in (20), we have not addressed the important issue of what determines the values of the parameters in (2) in general. The climate system is obviously not a 2-component system, and adiabatic and inviscid atmospheric and oceanic circulations also conserve potential enstrophy and not just energy, whereas a 2-component model can have only one quadratic invariant. The dual constraints of energy and potential enstrophy conservation impose strong constraints on the decay of energy and enstrophy wavenumber spectra. This implies, in the context of our 2-component model, that the parameter D should be some decaying function of λy, which we specified as D=D0/λy as a minimum decay requirement to prevent the unphysical “ultraviolet catastrophe” of generating infinite large-scale variance by stochastic backscatter from infinitely small scales.

The role of the D terms in (7) is subtle, but has important implications for representing the effects of fast variables on slow variables through “stochastic parameterizations” in simple17–20 as well as comprehensive21–24,31 nonlinear models. On the one hand, the only terms that can be represented as a pure additive noise forcing in such parameterizations of quadratic nonlinearities are the fast-fast interaction terms involving D, as in (20). On the other hand, a non-zero D can undermine the stochastic parameterization agenda itself by requiring not only λx/λy but also γx=Dx/λy to be much smaller than 1 in (10b). Even if the former condition of adequate time scale separation is satisfied, one can imagine the latter condition being violated for moderately large values of D and x.

We end by noting that although (2) is highly relevant for quantitatively understanding the non-Gaussianity of the probability distributions of large-scale atmospheric and oceanic variables, it cannot explain the non-Gaussianity associated with nonlinear slow-slow interactions, as amply demonstrated by the substantial deviation of our of 2-d model results for moderate λy/λx from those for large λy/λx. Even in such cases, however, one can gain a basic understanding of the sign and magnitude of the skewness by considering the orientation and location of the line (or an appropriate hyper-plane) of zero coupling as in Figure 2. It is also noteworthy in this context that multi-component empirical linear stochastically forced models that ignore the slow-slow interactions have comparable skill to that of comprehensive nonlinear numerical models in representing and predicting large-scale atmospheric and oceanic variations.1,25–29 This suggests that the slow-slow interactions are of relatively minor importance in the dynamics of many large-scale weather and climate variations. If they were important, one would expect them to have a strong impact not only on the non-Gaussianity but also on the prediction skill of the variations.

This work was partly supported by the U.S. Department of Energy (DOE) Office for Science (BER) under Grant No. DE-SC0006965. We also gratefully acknowledge the comments of two anonymous reviewers, whose remarks significantly improved the article.

1.
P. D.
Sardeshmukh
and
P.
Sura
,
J. Clim.
22
,
1193
(
2009
).
2.
P.
Sura
and
P. D.
Sardeshmukh
,
J. Phys. Oceanogr.
38
,
639
(
2008
).
3.
G. E.
Uhlenbeck
and
L. S.
Ornstein
,
Phys. Rev.
36
,
823
(
1930
).
4.
C.
Penland
and
P. D.
Sardeshmukh
,
Chaos
22
,
023119
(
2012
).
5.
A. J.
Majda
,
C.
Franzke
, and
D.
Crommelin
,
Proc. Natl. Acad. Sci.
106
(
10
),
3649
3653
(
2009
).
6.
W.
Rümelin
,
SIAM J. Numer. Anal.
19
,
604
(
1982
).
7.
J. A.
Hansen
and
C.
Penland
,
Mon. Weather Rev.
134
,
3006
(
2006
).
8.
K.
Hasselmann
,
Tellus
28
,
473
(
1976
).
9.
P.
Sura
and
P. D.
Sardeshmukh
,
Atmos. Res.
94
,
140
(
2009
).
10.
C. W.
Gardiner
,
Handbook of Stochastic Methods
(
Springer-Verlag
,
Berlin
,
1984
).
11.
W.
Horsthemke
and
R.
Léfèver
,
Noise Induced Transitions
(
Springer-Verlag
,
Berlin
,
1984
).
12.
B. F.
Farrell
and
P. J.
Ioannou
,
J. Atmos. Sci.
52
,
1642
(
1995
).
13.
J. S.
Whitaker
and
P. D.
Sardeshmukh
,
J. Atmos. Sci.
55
,
237
(
1998
).
14.
M.
Newman
and
P. D.
Sardeshmukh
,
J. Clim.
21
,
4326
(
2008
).
15.
C.
Penland
and
P. D.
Sardeshmukh
,
J. Clim.
8
,
1999
(
1995
).
16.
M.
Newman
,
P. D.
Sardeshmukh
, and
C.
Penland
,
J. Clim.
22
,
2958
(
2009
).
17.
C. E.
Leith
,
Physica D
98
,
481
(
1996
).
18.
B. F.
Farrell
and
P. J.
Ioannou
,
Phys. Fluids A
5
,
2600
(
1993
).
19.
D. S.
Wilks
,
Q. J. R. Meteorol. Soc.
131
,
389
(
2005
).
20.
H. M.
Arnold
,
I. M.
Moroz
, and
T. N.
Palmer
,
Philos. Trans. R. Soc., A
371
,
20110479
(
2013
).
21.
J.
Berner
,
G. J.
Shutts
,
M.
Leutbecher
, and
T. N.
Palmer
.,
J. Atmos. Sci.
66
,
603
(
2009
).
22.
J.
Culina
,
S.
Kravtsov
, and
A. H.
Monahan
,
J. Atmos. Sci.
68
,
284
(
2011
).
23.
M. F.
Jansen
and
I. M.
Held
,
Ocean Modell.
80
,
36
(
2014
).
24.
A.
Weisheimer
,
S.
Corti
,
T. N.
Palmer
, and
F.
Vitart
,
Philos. Trans. R. Soc., A
372
,
20130290
(
2014
).
25.
K.
Pegion
and
P. D.
Sardeshmukh
,
Mon. Weather Rev.
139
,
3648
(
2011
).
26.
N.
Cavanaugh
,
T.
Allen
,
A.
Subramanian
,
B.
Mapes
,
H.
Seo
, and
A. J.
Miller
,
Clim. Dyn.
44
,
897
906
(
2015
).
27.
S.
Saha
 et al,
J. Clim.
19
,
3483
3517
(
2006
).
28.
L.
Zanna
,
J. Clim.
25
,
5047
5056
(
2012
).
29.
M.
Newman
,
J. Clim.
26
,
5260
5269
(
2013
).
30.
J. E.
Wilkins
,
Ann. Math. Stat.
15
,
333
335
(
1944
).
31.
M.
Ghil
and
A. W.
Robertson
, “Solving problems with GCMs: General circulation models and their role in the climate modeling hierarchy,” in
General Circulation Model Development: Past, Present and Future
, edited by
D.
Randall
(
Academic Press
,
2000
), pp.
285
325
.