The probability distributions of large-scale atmospheric and oceanic variables are generally skewed and heavy-tailed. We argue that their distinctive departures from Gaussianity arise fundamentally from the fact that in a quadratically nonlinear system with a quadratic invariant, the coupling coefficients between system components are not constant but depend linearly on the system state in a distinctive way. In particular, the skewness arises from a tendency of the system trajectory to linger near states of weak coupling. We show that the salient features of the observed non-Gaussianity can be captured in the simplest such nonlinear 2-component system. If the system is stochastically forced and linearly damped, with one component damped much more strongly than the other, then the strongly damped fast component becomes effectively decoupled from the weakly damped slow component, and its impact on the slow component can be approximated as a stochastic noise forcing plus an augmented nonlinear damping. In the limit of large time-scale separation, the nonlinear augmentation of the damping becomes small, and the noise forcing can be approximated as an additive noise plus a correlated additive and multiplicative noise (CAM noise) forcing. Much of the diversity of observed large-scale atmospheric and oceanic probability distributions can be interpreted in this minimal framework.

Quadratically coupled systems with both fast and slow components often exhibit skewness in the probability distribution of the slow variable. We show both theoretically and numerically in a two-dimensional system with a quadratic invariant that this occurs when the timescale separation between the systems causes the fast variable to affect the slower variable as correlated additive and multiplicative (CAM) stochastic forcing.

## I. INTRODUCTION

It is increasingly evident that the probability distributions of large-scale atmospheric and oceanic variables are not only non-Gaussian but also non-Gaussian in a distinctive way. Figure 1 illustrates this for absolute vorticity, an important atmospheric variable at the dynamically active jet stream level, and for sea surface temperature (SST), an important oceanic variable at the air-sea interface. Specifically, it shows the skewness *S* and kurtosis *K* of the daily anomalies $x\u2032=x\u2212\u27e8x\u27e9$ of these variables from their long-term averages $\u27e8x\u27e9$ in winter, defined as

where $\u27e8x\u2032\u27e9=0$, $\sigma 2=\u27e8x\u20322\u27e9$ is the variance, and angle brackets refer to long-term averages. *S* is clearly not zero, and *K* is generally positive as evident from the scatter plots of *K* versus *S*. A positive *K* is a measure of the heaviness of the tails of a probability distribution relative to those of a Gaussian distribution. Figure 1 also indicates a distinctive parabolic *K-S* relationship, $K>(3/2)S2$, for these variables.

Sardeshmukh and Sura^{1} (SS09 hereafter) argued that these and other distinctively non-Gaussian features of the observed probability density functions (PDFs) could be understood by considering the following 1-D linear Markov process model of the anomalies:

where *A** **<** **0*, *b*, *E*, and *g* are constants, and *η _{1}* and

*η*are independent Stratonovic Gaussian white noises, with ⟨

_{2}*η*⟩ = 0. Note that

_{1}η_{2}*η*is related to a standard Brownian motion

_{i}*W*as

_{i}*dW*=

_{i}*η*. The

_{i}dt*bη*term represents additive stochastic noise forcing. SS09 referred to (

_{2}*g*+

*Ex′*)

*η*as CAM noise forcing. The skewness of

_{1}*x′*arises from the asymmetric magnitude of this term with respect to the sign of

*x′*. The CAM noise forcing also drives a mean response, which needs to be negated by an additional forcing −

*Eg*/2 to ensure that ⟨

*x′*⟩ = 0. Note that if

*E*= 0, Eq. (2) represents a dynamical model of

*x′*as a standard linear Ornstein-Uhlenbeck “red noise” process

^{3}with a Gaussian stationary PDF.

Defining $\lambda =\u2212[A+(1/2)E2]$, which must be positive if the statistics of *x′* are stationary, and $\alpha =0.5E2/\lambda $, which must be less than 1/3 if the kurtosis *K* exists, the first four statistical moments of this CAM-noise driven process may be derived, using the Fokker-Planck equation associated with (2), as

*S*and

*K*on the CAM-noise amplitude parameters

*E*and

*g*. The

*K-S*inequality,

*K*

*>*

*(3/2)*

*S*, results from the fact that the first term within square brackets on the right hand side of (3d) is greater than 1 and the second term is greater than 0. Note that this is a much stronger constraint on

^{2}*K*than the universal lower bound

*K*

*>*

*S*− 2 if

^{2}*S*exists.

^{30}

An explicit expression for the stationary PDF of this stochastically generated skewed (“SGS”) process may also be derived using the Fokker-Planck equation as

where $N$ is a normalization constant

which ensures that *p(x*) integrates to unity. Here, *ν** **=** **λ/E ^{2}*,

*q*

*=*

*2gν/b*, and Γ

*(z)*is the standard Gamma function of complex argument

*z.*This explicit expression reveals the heavy tails of the distribution to be approximately power-law tails. For |

*Ex*′ +

*g*| ≫

*b*,

*p(x′)*decays with

*x′*as

where the +(−) sign refers to positive (negative) tails. SS09 provided examples of such approximate power-law tails in the observed PDFs of upper-tropospheric vorticity and mid-tropospheric geopotential height anomalies. Note that the inference of such tails from (2) does not require even the variance to exist. This prompted Penland and Sardeshmukh (2012)^{4} to propose an alternative interpretation of power-law distributions encountered in many natural contexts as possibly arising from SGS processes, instead of Lévy processes as is often assumed.

For future reference, we note that the time-lag autocorrelation function $\rho (\tau )$ of *x′* in (2) can be derived using the Fokker-Planck equation for the conditional probability of $x\u2032(t+\tau )$ given $x\u2032(t)$ as $\rho (\tau )$ $=\u27e8 x\u2032(t+\tau )x\u2032(t) \u27e9/\sigma 2$ $=e\u2212\lambda \tau $. The power spectrum of *x′* can then be derived using the Wiener-Khinchin theorem as

whose form is identical to that of a standard *O-U* “red noise” process.

This single-component anomaly model was discussed extensively in SS09 and in Ref. 4. Its success in capturing the essence of the non-Gaussianity of observed probability distributions is evident. Why this should be so in the obviously nonlinear multi-component climate system is, however, still not completely clear.

We provide a rationale here for (2) as an approximate dynamical equation for the slow components of energy-conserving quadratically nonlinear dynamical systems, in which the fast components are driven by additive stochastic forcing and are strongly damped. We argue that even without forcing and dissipation, the statistics of such a system are generally skewed because the coupling coefficients between the system components are not constant but depend linearly on the system state in a distinctive way. In particular, the skewness arises from the system trajectory lingering near states of weak coupling. We show that the salient features of the observed non-Gaussianity can be captured in the simplest such nonlinear 2-component system. If the system is stochastically forced and linearly damped, with one component damped much more strongly than the other, then the strongly damped fast component becomes effectively decoupled from the weakly damped slow component, and its impact on the slow component can be approximated as a stochastic noise forcing plus an augmented nonlinear damping. In the limit of large time-scale separation, the nonlinear augmentation of the damping becomes small, and the noise forcing can be approximated as an additive noise plus a CAM noise forcing, leading to Eq. (2) for the slow component.

## II. A SIMPLE NONLINEAR OSCILLATOR

Consider the 2-component system

where $\lambda x$ and $\lambda y$ are damping constants, *B*, *C*, and *D* are coupling constants, and $S\eta $ represents stochastic Gaussian white noise forcing of *y* with amplitude $S$. The “energy” $(x2+y2)/2$ of this system evolves as

and is conserved in the absence of forcing and dissipation. Note that if there is neither forcing nor dissipation, any other quadratically nonlinear 2-component system with a quadratic invariant can be expressed in the form (7) through a linear transformation of system variables.

Figure 2 shows the phase portrait of this system in the absence of forcing and dissipation. For a linear system, *C* and *D* are both zero, and starting from any initial condition the system stays in a circular orbit around the origin at a constant angular velocity *B*. This symmetry is broken for a nonlinear system, in which *C* and/or *D* are not zero, by the presence of a line of zero coupling, $\Omega (x,y)$ = $B+Cx+Dy$ = 0, located at a distance $d=B/C2+D2$ from the origin. The line is a line of continuous stable and unstable fixed points separated by a neutral fixed point. In this nonlinear scenario, initial states that are closer than *d* to the origin follow circular orbits as before, but with variable angular velocities that are slower than *B* near the line and faster than *B* away from it. Initial states that are farther than *d* also follow circular trajectories with variable angular velocities, but these eventually intersect with the line and then stay there.

If the system is forced and damped, the line of zero coupling ceases to be a line of fixed points, but its orientation and distance from the origin remain the primary determinants of the sign and magnitude of the skewness of *x* (or *y*). If, for example, *D* is zero but *C* is not, the line is vertical (parallel to the *y* axis) and introduces an obvious asymmetry in the dynamics of *x*. If, on the other hand, *C* is zero but *D* is not, the line is horizontal (parallel to the *x* axis) and there is no obvious asymmetry in the dynamics of *x*. However, the damping of *x* introduces an asymmetry even in this case by accelerating the trajectory near the line for positive *x* and decelerating it for negative *x*. In general, the sign of *BC* determines the sign of the skewness of *x* in the former case, and the sign of *BD* determines it in the latter. In the most general case, of course, neither *C* nor *D* is zero, as illustrated in Figure 2.

We are especially interested here in understanding the dynamics of this system if one component is damped much more strongly than the other. It turns out that the dynamics of the weakly damped component can then be approximated as (2), as we show both theoretically and numerically below.

## III. THE APPROXIMATE DYNAMICS OF THE STOCHASTICALLY FORCED AND DAMPED NONLINEAR OSCILLATOR

We designate *x* as the weakly damped slow variable in (7), and *y* as the strongly damped fast variable, with $\lambda y>\lambda x$. We show here that if $\lambda y\u226b\lambda x$, the equation for the anomalies $x\u2032=x\u2212\u27e8x\u27e9$ of *x* from its mean $\u27e8x\u27e9$ can be approximated as (2).

If $\lambda y\u226b\lambda x$, we may write $y=y\u0302+ys$, where $y\u0302$ represents a part of *y* that evolves in quasi-equilibrium with *x*. We may then write equations for the slow and fast parts of *y* as

where $\gamma =D/\lambda y$. If we now also assume that $\gamma x\u226a1$, then on the slow timescales of *x* we can approximate *y _{s}* first as red noise with mean and variance

and then this red noise as proportional to a white noise $\eta 1$

which allows (10a) to be approximated as

where $(ys2)\u2032$ indicates a deviation of $ys2$ from its mean $\u27e8 ys2 \u27e9$. On timescales $1/\lambda x$ of *x*, we may approximate $(ys2)\u2032$ as red noise, with a damping rate of $2\lambda y$, and then as proportional to a white noise $\eta 2$

This expression leads, to first order in $\gamma x$, to

If we write $S=S0\lambda y$ and $D=D0/\lambda y$, then retaining terms to $O(1/\lambda y1/2)$ in (15) gives

Justification for these specified dependences of S and *D* on *λ _{y}* is provided below. We note in passing that if S in (15) is instead specified as $S=S1\lambda y$ (as in, e.g., Ref. 5), then retaining terms to $O(1/\lambda y1/2)$ in (15) leads not only to the augmented damping terms being dropped from (16) but also the $D0$ terms as well, leaving the CAM noise forcing $(B+Cx)S1\lambda y\u22121/2\eta 1$ as the only link between $x$ and $y$ for large but finite timescale separation. Note that it too vanishes in the limit of infinite timescale separation $\lambda y\u2192\u221e$, leading to a “no noise forcing limit”

^{10,11}instead of the “fast noise forcing limit” considered here. The variance of

*x*vanishes in such a no noise limit, but approaches a constant in the fast noise limit. Numerical simulations of (7) performed with both the no noise limit (not shown) and the fast noise limit (shown below) verify these limits.

We now write $x=\u27e8 x \u27e9+x\u2032$ in (16) as a mean plus an anomaly. The equation for the mean is

which gives

which is identical to (2), with

Note again that this approximation requires both $\lambda x/\lambda y$ and $\gamma x=Dx/\lambda y$ to be much smaller than unity.

## IV. NUMERICAL CALCULATIONS

We confirm the dynamics of the nonlinear oscillator in the limit of $\lambda y\u226b\lambda x$ derived above through numerical integrations of (7), using specified values of $\lambda y$, and $\lambda x=0.1$, $B=0.1$, $C=0.05$, $D=D0/\lambda y=0.05/\lambda y$, and $S=S0\lambda y=2\lambda y$. Four distinct cases, with different combinations of *C* and *D*, were investigated

Integrations in each case were performed for $\lambda y/\lambda x$ = 1, 2, 10, 100, and 300, using a Runge-Kutta-4 method^{6,7} with a time step $\Delta t=1/(500\lambda y)$. Each integration was sampled every 100 time steps, with at least 5 × 10^{6} samples recorded.

Figure 3 shows the stationary PDFs of the standardized anomalies $x\u2032/\sigma x=(x\u2212\u27e8 x \u27e9)/\sigma x$ obtained in each case, together with the theoretical predictions for $\lambda y/\lambda x$ = 300. Note from Eqs. (19) and (20) that the limiting PDFs for case *B* and case *BC* are independent of $\lambda y/\lambda x$. Case *B* represents the simplest case of a damped linear oscillator driven by additive stochastic noise forcing, and the PDFs are Gaussian as expected. The PDFs are skewed in case *BC* and case *BCD*, and are well matched by the theoretical predictions for large $\lambda y/\lambda x$ given by (4) and (20). Note that the limiting PDF of *x* in case *BD* is a non-centered Gaussian, as verified in Figures 3 and 5. Note also that the theoretical PDF in case *BC* (for which *b* is exactly zero in (20) and (2)) is somewhat different from (4), and is given by

with the normalization constant

In this case, *x′* is limited on one side by −*g/E*, where there is a reflecting boundary condition.

Figure 4 highlights, for the most general case *BCD* with $\lambda y/\lambda x=300$, the heaviness of the tails predicted by (4) relative to Gaussian tails. The numerical result is in excellent agreement. For this case, the theory predicts the negative tail to be lighter than a Gaussian tail for $\u221240<x\u2032/\sigma x<\u22123$, but heavier again for larger negative values and decaying with the same power-law exponent as the positive tail. Our numerical integrations for this case were long enough to accurately estimate the probability densities to nearly four orders of magnitude smaller than the peak of the PDF, but not long enough to confirm the prediction in Figure 4(b) of the vanishingly small probabilities of extremely large negative values.

Figure 5 shows the mean, standard deviation, skewness *S*, and kurtosis *K* as functions of $\lambda y/\lambda x$ in the four cases. All four statistical moments appear to converge to the values predicted using (3) and (20). The convergence is rapid in the linear case *B*, consistent with traditional expectations,^{8–10} but much slower in the nonlinear cases *BC*, *BD*, and *BCD,* requiring $\lambda y/\lambda x$ to be at least as large as 100 for the asymptotic dynamics (19) to represent a valid approximation.

The spectra of *x* are shown in Figure 6. They approach a red spectrum in all four cases for large $\lambda y/\lambda x$, with an $f\u22122$ decay at high frequencies as predicted by (6). Interestingly, the decay is steeper for moderate values of $\lambda y/\lambda x$, even in the linear case *B*. In fact, this property is easy to show analytically for case *B.*

Figure 7 shows the corresponding spectra of the fast variable *y* in the four cases, in an identical format to Figure 6. In all cases, they approach red noise spectra for large $\lambda y/\lambda x$ as expected, and importantly, with the same white low-frequency tail. It is this that justifies approximating the red noise *y _{s}* as a white noise stochastic forcing of

*x*in (12), with $S$ specified as $S=S0\lambda y$ to ensure an asymptotic value of the white noise spectral density as $\lambda y/\lambda x\u2192\u221e$. The variance of

*y*then does increase linearly with $\lambda y$ as $\lambda y/\lambda x$ increases, but that of

_{s}*x*does not, as shown in Figure 5, in essence because

*x*responds mainly to the forcing associated with the “coarse-grained” low-frequency part of

*y*.

The larger variance of *y _{s}* obtained for larger values of $\lambda y$ is consistent with the “white noise limit,”

^{10,11}which requires that the product of this variance and $\lambda x/\lambda y$ remain constant as the limit is taken. This requirement is what determined our specified dependence of $S=S0\lambda y$ on

*λ*, which yields the correct approximate dynamics (19) for

_{y}*x*in (7) for large but finite $\lambda y$ by taking the limit $\lambda y\u2192\u221e$. In contrast, specifying $S$ as $S=S1\lambda y$ in the “no noise limit” formulation mentioned earlier, which requires the variance of

*y*to remain constant as $\lambda y$ is increased, yields a vanishing

_{s}*x*in the limit $\lambda y\u2192\u221e$. One way around this,

^{5}if $\lambda y$ is finite, is not to take the formal limit in going from (15) to (16), but to retain the stochastic forcing as $(B+Cx)S1\lambda y\u22121/2\eta 1$ to $O(1/\lambda y1/2)$. This yields an approximate stochastically forced dynamics of

*x*equivalent to ours in the

*B*and

*BC*cases (without any augmented damping), but an incorrect approximate dynamics in the

*BD*and the most general

*BCD*cases.

Note that the approximation made in (12) and (14) is not only that for large $\lambda y/\lambda x$ both $ys\u2032$ and $(ys2)\u2032$ are white at low temporal frequencies but also that their PDFs are approximately Gaussian at those frequencies. Figure 8 shows the PDFs of the standardized values of *y′* and (*y ^{2}*)

*′*, which are dominated by those of that $ys\u2032$ and $(ys2)\u2032$, for $\lambda y/\lambda x=300$. The PDFs of

*y′*(Figure 8(a)) are indistinguishable from a standard Gaussian in all four cases. When (

*y*) is averaged over half the decay time of the slow variable, i.e.,

^{2}*1/(2λ*), its PDFs are also nearly Gaussian (Figure 8(b)), particularly relative to the chi-squared distribution with one degree of freedom (not shown) of the unaveraged values of (

_{x}*y*)

^{2}*′*.

A further approximation made in (12) and (14) is that $ys\u2032$ and $(ys2)\u2032$ are not just uncorrelated but are *independent* at low temporal frequencies, i.e., *η _{1}* and

*η*are independent Gaussian white noises in (19) and (2). This approximation is also easily validated. Although the unaveraged values of

_{2}*y*are obviously dependent on those of

^{2}*y*, the time-averaged “coarse-grained” values are not. To confirm this, we estimated the mutual information

*I*(

*y′*,

*z′*), with

*z*=

*y*, defined as

^{2}where $py$, $pz$, and $pyz$ refer to the marginal PDFs of *y′* and *z′* and their joint PDF, respectively. We estimated these PDFs using the joint histogram of *y′* and *z′*, dividing each of their ranges into *M* equal sized bins of standardized values. Such a simple estimation of *I*(*y′*,*z′*) is not optimal and depends on the choice of *M.* We performed the calculations for three values *M* = 10, 20, and 30. Figure 9 shows the average of the three estimations of *I*(*y′*, *z′*) as a function of the time averaging interval in units of 1*/λ _{x}*, with confidence intervals indicated as the difference between the maximum and minimum estimated values. For comparison, we also show the theoretical maximum attainable value of

*I*(

*y′*,

*y*) if

^{2}′*y*is completely determined by

^{2}′*y′*; that is, we show the entropy of a standard chi-squared distribution with one degree of freedom. It is clear from Figure 9 that coarse-graining diminishes the mutual information between $ys\u2032$ and $(ys2)\u2032$, allowing us to approximate

*η*and

_{1}*η*are independent quantities in (19) and (2).

_{2}## V. DISCUSSION

The non-Gaussianity of the probability distributions of large-scale atmospheric and oceanic variables arises fundamentally from the fact that in the predominantly quadratically nonlinear climate system, the coupling coefficients between system components are not constant but depend linearly on the system state. The skewness arises from a tendency of the system to linger near states of weak coupling, generally located asymmetrically with respect to the mean state in state space. We have shown that the distinctively skewed and heavy-tailed character of the observed probability distributions can be captured in the simplest such 2-component nonlinear system, in which one component is stochastically forced and damped much more strongly than the other. In this limit, it becomes effectively decoupled from the weakly damped slow component, and its impact on the slow component can be approximated as an additive noise plus a CAM noise stochastic forcing. The CAM noise forcing is responsible for both the skewness and heavy tails of the distributions.

Although we have rigorously justified the CAM noise approximation here only in the context of the 2-component system (7), we suggest that much of the diversity of observed large-scale atmospheric and oceanic probability distributions can be interpreted in this minimal framework by identifying the weakly damped slow component with large-scale variables and the strongly damped fast component with small-scale variables. The CAM noise forcing may thus be interpreted as representing the impact of all slow-fast interactions on the slow variables, and the additional additive noise forcing as representing the impact of all fast-fast interactions. We have theoretically shown, and numerically confirmed, that in the limit of large time-scale separation the CAM noise forcing associated with a non-zero *C* in (7) is the primary determinant of the skewness and heavy tails of the PDF of the slow variable *x*. The additive noise forcing, associated with a non-zero *D* in (7), also affects the mean and variance of *x*, but neither its skewness nor its heavy tails in this limit. A similar remark applies to additional *external* additive noise forcing of *x*, consistent with traditional expectations,^{8} which we did not include in (7) for simplicity, given our focus on non-Gaussianity generated by internal system dynamics.

We should stress that our goal here was to seek a rigorous justification, in a simple context, of the limiting form (2) for the dynamics of individual system components of quadratically nonlinear systems *for the purposes of modeling their stationary statistics*. While there is evidence that (2) is in general a good model for deducing the stationary statistics of large-scale atmospheric and oceanic variables, one obviously does not expect it to be also a good model for forecasting those variables, not least because the one-dimensional linear dynamics of the asymptotic system cannot represent oscillatory features in the temporal lag-autocorrelation functions. Multi-component linear stochastically forced models, such as those used in Refs. 12–14 for representing atmospheric circulation variability and in Refs. 15 and 16 for representing tropical SST variability, can represent such oscillatory features, but collapsing their dynamics into the single-component form (2) for our purposes here apparently requires additional *a priori* knowledge about the underlying nonlinear system.

Also, although we did so for the 2-component system in (20), we have not addressed the important issue of what determines the values of the parameters in (2) in general. The climate system is obviously not a 2-component system, and adiabatic and inviscid atmospheric and oceanic circulations also conserve potential enstrophy and not just energy, whereas a 2-component model can have only one quadratic invariant. The dual constraints of energy and potential enstrophy conservation impose strong constraints on the decay of energy and enstrophy wavenumber spectra. This implies, in the context of our 2-component model, that the parameter *D* should be some decaying function of $\lambda y$, which we specified as $D=D0/\lambda y$ as a minimum decay requirement to prevent the unphysical “ultraviolet catastrophe” of generating infinite large-scale variance by stochastic backscatter from infinitely small scales.

The role of the *D* terms in (7) is subtle, but has important implications for representing the effects of fast variables on slow variables through “stochastic parameterizations” in simple^{17–20} as well as comprehensive^{21–24,31} nonlinear models. On the one hand, the only terms that can be represented as a pure additive noise forcing in such parameterizations of quadratic nonlinearities are the fast-fast interaction terms involving *D*, as in (20). On the other hand, a non-zero *D* can undermine the stochastic parameterization agenda itself by requiring not only $\lambda x/\lambda y$ but also $\gamma x=Dx/\lambda y$ to be much smaller than 1 in (10b). Even if the former condition of adequate time scale separation is satisfied, one can imagine the latter condition being violated for moderately large values of *D* and *x*.

We end by noting that although (2) is highly relevant for quantitatively understanding the non-Gaussianity of the probability distributions of large-scale atmospheric and oceanic variables, it cannot explain the non-Gaussianity associated with nonlinear slow-slow interactions, as amply demonstrated by the substantial deviation of our of 2-d model results for moderate $\lambda y/\lambda x$ from those for large $\lambda y/\lambda x$. Even in such cases, however, one can gain a basic understanding of the sign and magnitude of the skewness by considering the orientation and location of the line (or an appropriate hyper-plane) of zero coupling as in Figure 2. It is also noteworthy in this context that multi-component empirical linear stochastically forced models that ignore the slow-slow interactions have comparable skill to that of comprehensive nonlinear numerical models in representing and predicting large-scale atmospheric and oceanic variations.^{1,25–29} This suggests that the slow-slow interactions are of relatively minor importance in the dynamics of many large-scale weather and climate variations. If they were important, one would expect them to have a strong impact not only on the non-Gaussianity but also on the prediction skill of the variations.

## ACKNOWLEDGMENTS

This work was partly supported by the U.S. Department of Energy (DOE) Office for Science (BER) under Grant No. DE-SC0006965. We also gratefully acknowledge the comments of two anonymous reviewers, whose remarks significantly improved the article.