Two elementary models of ocean circulation, the well-known double-gyre stream function model and a single-layer quasi-geostrophic (QG) basin model, are used to generate flow data that sample a range of possible dynamical behavior for particular flow parameters. A reservoir computing (RC) machine learning algorithm then learns these models from the stream function time series. In the case of the QG model, a system of partial differential equations with three physically relevant dimensionless parameters is solved, including Munk- and Stommel-type solutions. The effectiveness of a RC approach to learning these ocean circulation models is evident from its ability to capture the characteristics of these ocean circulation models with limited data including predictive forecasts. Further assessment of the accuracy and usefulness of the RC approach is conducted by evaluating the role of both physical and numerical parameters and by comparison with particle trajectories and with well-established quantitative assessments, including finite-time Lyapunov exponents and proper orthogonal decomposition. The results show the capability of the methods outlined in this article to be applied to key research problems on ocean transport, such as predictive modeling or control.

## I. INTRODUCTION

The inherently high-dimensional and nonlinear nature of fluid dynamics presents formidable barriers to the analysis, prediction, and simulation of real-world and model flows.^{1} Fluid-coupled control^{2–5} of autonomous vehicles that can perform a variety of sensing tasks in the ocean and atmosphere^{6,7} is just two examples of common tasks that generally lead to difficult optimization problems. Large scale multi-physics systems such as atmospheric or oceanic circulation tend to present additional complexities, placing efficient or detailed prediction in such systems far out of reach of current methods. Reduced models like quasi-geostrophy play an important role by identifying dynamical regimes and in facilitating studies that are not possible in more cumbersome models. More recently, quasi-geostrophic (QG) models have served as components within more complex models and as a framework to reconstruct and assimilate flows from satellite data.^{8–10}

Data-based study of ocean flows is further challenged by the enormity and inaccessibility of the system, resulting in datasets that are both too much (raw volume) and too little (sparse or incomplete). Primitive equation-based ocean models are also complex and unwieldy and are, thus, shunned for certain tasks in favor of simpler reduced models. Quasi-geostrophy is an example of a reduced model which, despite being highly simplified, is still a partial differential equation (PDE)-based model that exhibits rich dynamics including low-frequency variability, multiple equilibria, and parametric bifurcation to aperiodic and chaotic solutions,^{11–15} advancing the understanding of key dynamics underlying the behavior of complex primitive equation ocean models and ocean observations.

A dynamical systems approach to ocean flows has led to new tools and useful diagnostics, which has improved our understanding of transport. However, the determination of Lagrangian coherent structures (LCS)^{2,16} through the finite-time Lyapunov exponent (FTLE) field and other geometric and probabilistic methods^{17} is also computationally demanding and seems to offer limited capacity to make detailed flow predictions. Related approaches have exploited modal decomposition,^{18} both as a modeling tool^{19} and more broadly as a tool to analyze data for purposes of control, modeling, or prediction.^{20,21} The double-gyre (DG) streamfunction (toy) model is the foundation of much of that work, becoming a workhorse and standard example within the dynamics community.^{22} Furthermore, Aref's work^{23} has highlighted the sensitive nature of particle trajectories and their chaotic nature even in very simple time-dependent flows.

Machine learning (ML) provides alternate, independent insights into dynamical systems and into fluid dynamics,^{24–29} thanks to its ability to uncover patterns in large datasets. Machine learning derives models from data through optimization but often requires a large amount of data to achieve good results. Fortunately, many fluid dynamical problems, including large scale geophysical flows, offer a wealth of experimental or simulation data.

Reservoir computing (RC)^{30} is an implementation of a recurrent neural network (RNN) that avoids several drawbacks of RNNs, such as their high computational cost and tendency to overfit. This is achieved in part by learning only on output weights,^{31–33} allowing the hidden layer to act as a reservoir and capture features of the history of inputs. The echo state network (ESN) variety of RC that we use is described in more detail in Sec. II.

RC is noteworthy for its forecasting capability for complex systems.^{34} Recently, an RC approach was trained on data from the intrinsically aperiodic Lorenz dynamical system,^{35} and both reproduced characteristics of its attractor and made predictions of future states. The same authors^{35,36} extended this approach to the Kuramoto–Sivashinsky (KS) equation, a one-dimensional, nonlinear PDE model of reaction–diffusion and fluid phenomena, and were similarly able to predict future spatiotemporal states. It follows naturally that RC can be applied to a range of geophysical fluid dynamics (GFD) problems. This study adopts as its focus a pair of reduced ocean models: the double-gyre and the barotropic QG models, which are closely related but with significant distinctions.

On the one hand, QG models and their solutions have provided researchers insights into the key dynamics of the global ocean circulation, the so-called Sverdrup transport, in terms of the vorticity balance between dissipation and wind stress driving (Ekman pumping) in a closed basin.^{37} Using only three non-dimensional control parameters, the barotropic QG model can describe both the nature and the complex variability of the prominent Western boundary currents seen in the world's oceans; the history of this research has been reviewed in the text by Pedlosky^{38} and in several well-known works.^{39–41} QG, thus, allows direct examination of the role of nonlinearity, in terms of the so-called inertial or Fofonoff mode, within the Munk model, features that are generally obscured in complex physical equation simulations and the actual ocean. As a result, QG continues to be a powerful tool in emergent applications of ocean and climate research, ranging from theories for the thermocline,^{42,43} to interannual variability,^{44,45} to the separation of the western boundary current (i.e., gulf stream),^{41,46,47} while more recently, QG models have been applied to the Atlantic meridional overturning circulation (AMOC)^{48} and as a framework for data assimilation^{49} and for turbulence model closure.^{50}

Given their excellent dynamical capabilities, QG models have served as a foundation for the diagnosis and development of sensing strategies and control^{2} and other applications based on or coupled to ocean transport, such as pollutant dispersal. In such applications, even the highly reduced barotropic QG model, a PDE, is sometimes too computationally expensive. As a result, many researchers have adopted the double-gyre stream function^{51} model as a useful proxy for a full QG model, which, nevertheless, mimics its key *kinematic* transport features. Like QG, DG is typically driven by time-dependent forcing that can represent wind stress, but unlike QG, DG does not exhibit intrinsic variability; indeed, DG is a direct prescription of a flow stream function.

Even in DG-based studies of transport, researchers have faced challenges in predicting trajectories of tracer particles and in understanding key transport features. This is not surprising, as the pioneering work of Aref^{23} has shown that tracer trajectories are often chaotic even in the simplest two-dimensional time-dependent flows, such as a Stokes flow. Tools such as modal analysis, while powerful and useful, were not ideally suited to the above challenges. With the work of Shadden *et al.*,^{22} the application of Lagrangian coherent structures (LCS) and the finite-time Lyapunov exponent (FTLE) the LCS is based on were shown to give greater insight into transport features and trajectories of DG models. Indeed, in this study, we adopt and leverage the combination of modal analysis, FTLE, and particle trajectories to assess the efficacy of machine learning (ML) models to capture the complex nature of transport in both DG and QG flows, much as the attractor can be used to compare chaotic dynamical systems.

The approach presented in this study applies reservoir computing to data produced by: (i) a double-gyre stream function model with time-variable forcing and (ii) a single-layer (barotropic) QG basin model. Each model is formulated as a discrete, time-dependent dynamical system, which is used to produce stream function time-series data that provide the input to a reservoir computing model. For each system, we use RC to find a low-dimensional model of the system, and we compare the predictions of the RC model to the original system using several relevant testing frameworks, including global stream function and ideal particle trajectories.

The quality of the predictions is also examined as a function of RC model parameters and by comparing the FTLE and proper orthogonal decomposition (POD) found using the RC model with those found using the original model. Taken together, modal decomposition, FTLE, and trajectories characterize a particular model flow, and we use these to assess how well our RC models capture the generic dynamical and transport features of the QG and DG. We find that RC models trained with relatively little sparse data very effectively reproduce the above quantities and allow accurate forecasting over significant timescales.

It is worth noting that while this study relies entirely on synthetic data from various QG models, its results bear directly on the capability of RC models to work with any form of flow data in the quasi-geostrophic regime, whether from observations or more complex models.

## II. RESERVOIR COMPUTING

As mentioned in Sec. I, echo state networks belong to a particular family of recurrent neural networks called reservoir computing, where the main idea is to drive a fixed, high-dimensional, random, sparse network, designated as the reservoir, with an input signal to produce a high-dimensional “echo” response, which is then used as a non-orthogonal signal basis to reconstruct the desired output, generally as a linear combination. In this work, we follow the standard implementation details.^{30}

The main idea of reservoir computing is to map a set of sequential data $u(t)$ of dimension $ninput$ into a high-dimensional reservoir state vector $r(t)$ of dimension $nres$ primarily by multiplying $u(t)$ with a $nres\xd7ninput$ matrix denoted as $Win$ [see Eq. (1a)]. A linear predictor is then applied on the two vectors to solve for an output $y(t)$ of dimension $noutput$. Here, $t=1,\u2026,\tau train$ is the set of discrete time points with *τ _{train}* denoting the total number of data points in the training dataset. In this case, the training procedure is formulated based on a dynamical system; the desired output $y(t)=u(t+1)$ and $ninput=noutput$. The training criterion depends solely on the hidden units, or state vector $r(t)$, and the output targets $y(t)$. As a result, this can be easily designed to be a convex optimization problem for linear predictors. The trade-off, however, is the problem of developing an effective representation of histories in the state vector $r(t)$.

The state dynamics of the reservoir during the training phase is governed by the following system of equations:

where $r(t)$ is the reservoir state vector, *f* is generally a sigmoid function (e.g., logistic, or tanh function), **W** is the $nres\xd7nres$ reservoir weight matrix, $Win$ is the $nres\xd7ninput$ input weight matrix, $u(t)$ is the input signal, $y\u0302(t)$ is the estimated value of the desired output $y(t)$, which is nothing more than the observed data, $Wout$ is a $noutput\xd7(ninput+nres)$-dimensional matrix of output weights to be applied as a final transformation on the extended system state, which consists of a concatenation of $u(t+1)$ and $r(t+1)$, and $\nu (t)$ is an $nres$-dimensional noise vector. Note that the reservoir is driven by both the input sequence $u(t+1)$ and the previous state $r(t)$.

The goal of the training phase is to optimize the output layer weight matrix $Wout$, such that the distance between $y\u0302(t)$ and the expected outputs $y(t)$ is a minimum in the least squares sense, where $y\u0302(t)$ is the estimated value of $y(t)$. We implement a linear regression with a regularization constraint *β* acting on the output weights; this scheme is known as ridge regression. For the set of training data, the optimal $Woptout$ is given by

where $||\xb7||2$ denotes the *L*_{2} norm. This expression can be approximated using gradient descent or directly calculated as

where **R** is the correlation matrix of the extended system states, **P** is the cross correlation matrix of the extended system states and the desired outputs, and **I** is the identity matrix.

The main difference between this training procedure and traditional recurrent network approaches, as mentioned earlier, is that the weights on most of the network are randomly initialized and remain fixed throughout the procedure. The weight matrices **W** and $Win$ are both specified based on the recommendations of established research.^{30} In particular, $Win$ is specified as a random matrix without any additional constraint, whereas **W** is specified as a random matrix with the additional constraint that its spectral radius (the maximum of the absolute value of the eigenvalues) is such that the system is brought to the edge of stability, where studied ESNs typically achieve maximum computational capability. To satisfy the constraint, it is usually sufficient to sample a range of spectral radii around unity, with preference given to higher values as the dynamics of the training data become more chaotic.^{52} Note that the only weights that are changed are the matrix $Wout$, which is ultimately determined through the training process to an optimal $Woptout$ through Eqs. (2) or (3). This is one of the primary advantages of this type of a neural network structure and a training approach.

The prediction phase is different from the training phase in that the system is now autonomous and does not use $u(t)$ as an input. Rather, the reservoir dynamics is governed by the single equation,

Equation (4) is the same as Eq. (1a), except that $u(t+1)$ is replaced by the output feedback loop as $Woptout[u(t);r(t)]$. The resultant reservoir states $r(t)$ for all $t>\tau train$ can always be projected to the input data space as

Figure 1 illustrates the training and prediction phases of the ESN framework. The resultant autonomous reservoir system has been found to successfully replicate different aspects of many chaotic dynamical systems, such as the Lorenz system.^{35} Previous studies on these methods and their application to chaotic dynamical systems motivate the rest of the work in this study, where chaotic fluid dynamical systems are applied as inputs to the ESN framework.

## III. STREAM FUNCTION DATA AND MODELING

For an open set $D\u2282\mathbb{R}2$, that is the domain of interest, we define the stream function $\psi (x,t)$ to be a time-dependent scalar field on *D*, where $x\u2208D$. While analytic solutions of the stream function and velocity fields may be defined in some cases, many real world problems do not have a theoretical form. Instead, stream functions are usually presented as discrete snapshots over a finite-time period and, along with velocity fields, are interpolated from the discrete grid of points $X0\u2282D$ for all points $x\u2209X0$ to be used for particle trajectory integration. In this article, while the analytic stream function for the double-gyre flow is known, the methods are applied to the data as if the analytic solution is not known. In contrast, the analytic solutions of the more complicated QG flow are not known, and, thus, we have no choice but to apply our methodology to the data. Therefore, in this study, the stream function $\psi (x,t)$ may refer to either the analytic form or the interpolated data. Although the two approaches are generally interchangeable, the interpolated version is of greater practical value.

We can formulate the stream function as a dynamical system and apply it as input to our reservoir computing framework; we frame the stream function as the feedback,

where $F:D\u21a6D$ is a flow map of the stream function and *t _{i}* is the time index. The stream function can be used as a time series of

*τ*snapshots, where each snapshot is in the domain

*D*and $i=1,\u2026,\tau $. By approximating the flow map $F$ using machine learning, we can model the stream function and its temporal evolution and use the estimated model to calculate velocity fields and particle trajectories.

Stream function data are extracted for *τ* time steps at each grid point in $X0$. In our experiment, we are using a rectangular grid (though any grid scheme can be used); $X0$ is a *m *×* n* rectangular matrix, so each snapshot in time is an *m *×* n* matrix, and the entire data matrix containing all snapshots across *τ* time steps is of size $m\xd7n\xd7\tau $. Recall that the input to the ESN, i.e., $u(t)$ in Sec. II, is a data vector of arbitrary size. In order to use the data matrix in the ESN framework, we convert the $m\xd7n\xd7\tau $ data matrix to the (*mn*)-dimensional vector $u(t)$ for each of the *τ* time steps.

In this work, we consider both the actual stream function $\psi (x,t)$ (but sampled as though it was derived from an experiment) and the estimated stream function $\psi \u0302(x,t)$ modeled from training data as output from the ESN framework. A visualization of the data manipulation is shown in Fig. 2.

### A. Training, testing, and prediction

The goal is to learn the flow map $F$. Using the ESN framework, stream function time series data are input into the reservoir, which expands the dimensionality of the data via a series of matrix multiplications and nonlinear operations [recall Eq. (1a) and the description of reservoir computing in Sec. II]. The transformed data are referred to as the reservoir states.

A portion of the stream function data is designated as the training data. Given our complete data matrix is of dimension $(mn)\xd7\tau $, we can remove a portion of size $(mn)\xd7\tau train$ where $\tau train<\tau $. To actually use these data for training purposes, the stream function input data are paired to itself using a $1:1$ mapping through $ti\u2192ti+1$, where $ti<\tau train$. We will then have a mapping of $\psi (x,ti)\u21a6\psi (x,ti+1)$. An unchanging dimension-reducing matrix is used to project the reservoir states to the input dimension, forming the output data. A model is fitted based on a linear regression of the output data on actual data. A visualization of the training is shown in Fig. 3.

Once the model is fitted on the designated training data, we allow the model to feedback its outputs, $\psi \u0302(x,ti+1)$ as inputs. We can compare these learned stream function fields against the actual stream function fields at every time index to evaluate the model.

## IV. IMPLEMENTATION AND RESULTS

Large-scale ocean circulation is driven primarily by winds and the Earth's rotation, forming characteristic gyres spanning irregularly shaped basins. Dissipation together with the effective variation of planetary rotation with latitude lead to an intensification of these currents along a western boundary.^{37}

To generate flow data, we adopt two well-known ocean models: a double-gyre flow and a quasi-geostrophic flow, both of which are used extensively as a foundation for the study of wind-driven ocean circulation. Stream function data, discretized on a spatial grid and in time, form the framework for modeling and forecasting. The same learning procedures are applied to DG and QG flows, and the results, predictions of stream function evolution and passive tracer transport, are evaluated by comparison to the actual solutions from the models. To further validate the predictions of our RC results, we compare both FTLE and modal decomposition (POD) of the actual and predicted systems.

### A. Double-gyre model

We first consider the well-known double-gyre stream function model^{2} to demonstrate our methodology. The DG flow is not the solution of any approximate ocean model; rather, it is a prescriptive formula for a flow's stream function with features that mimic ocean gyres, first used by Smith and Spiegel^{53} to examine tracer transport and later widely adopted, and adapted, as a useful model of ocean basin flow. The DG stream function adopted here is prescribed as

where

and is defined over the rectangular domain $[0,1]\xd7[0,2]$. Although the DG model is a “toy” model, it has the advantage of providing direct flow data at any spatial or temporal resolution without the need to solve a PDE.^{22}

An example of the DG stream function time evolution is shown in the bottom of Fig. 4. The quadratic function *f*(*x*, *t*) causes periodic oscillations across the domain and is meant to capture the wind-stress driving effect. The term “double-gyre” comes from the visually apparent pair of vortices that oscillate across the domain. The terms *ε* and *ω* in Eq. (8) are parameters, which, respectively, determine the magnitude and frequency of the oscillations. Setting *ω* = 0 would result in a double-gyre system with no oscillations and a pair of vortices centered at (0.5, 0.5) and (0.5, 1.5). In this study, we set $\epsilon =0.3$ and $\omega =\pi /5$. Importantly, the DG system has a complicated basin boundary structure in which the basins of attraction are intermingled, a signature of the existence of a fractal basin boundary. Because of this intermingling of the basin boundaries, one can expect a sensitive dependence on initial conditions in the particle trajectories.^{2}

We can also consider the velocity field associated with this stream function. For any stream function *ψ*, the velocity field **v** is given by its standard definition,

and for the double-gyre flow, it takes the following form:

Although the DG flow does have a closed form solution for its stream function and velocity field, this is often not the case. In fact, as we shall soon see, more intricate and more realistic flow models of ocean dynamics generally do not have a closed form solution for the velocity field. As a result, while we do have the closed form solution to the velocity field for the double-gyre flow model, which provides a useful standard for evaluation, we will proceed as though we do not.

### B. Quasi-geostrophic model

Quasi-geostrophy in its basic form represents a solution of the Navier–Stokes-based primitive equations of ocean circulation found using a series in powers of a small parameter, *ε*, identified as the Rossby number. Steady, geostrophic, flow is recovered as the leading order solution, representing dominant rotation (small *ε*). By expressing the next order solution in terms of leading order quantities, the QG model is found. Details of the derivation of QG and some of its variants can be found in Ref. 37.

The QG model is, thus, significantly more realistic but also more computationally demanding compared to the DG model. The QG model used in this study is a barotropic, single-layer QG model with a lateral (viscous, Munk) and bottom (drag, Stommel) friction that exhibits solutions that capture important geophysical fluid dynamics (GFD) features, including western intensification.^{38} The QG model can be expressed as

where *J* is the Jacobian operator, *μ* is the Stommel friction, *λ* is the Munk friction, and *w _{E}* is the driving due to wind. Non-dimensional parameters, named Stommel $(\delta S)$, Munk $(\delta M)$, and Inertial $(\delta I)$, are formulated with respect to the relative lengthscales of three important boundary layers.

^{38}The corresponding non-dimensional parameters, $\mu ,\lambda ,$ and

*ε*, capture the relative importance of bottom friction, lateral diffusion, and the nonlinearity; these parameters and the various QG models are defined in Appendix A. In the following, we note that only the pure Stommel case, defined by $\delta M=\delta I=0$, with dissipation in the form of frictional drag, has a linear closed-form solution. Nonlinear cases, having $\delta I\u22600$, and viscous dissipation, $\delta S\u22600$, are more commonly applied to specific problems

^{38}in ocean dynamics and, thus, of more interest. Such models are also expected to be more challenging to capture using machine learning. We focus primarily on modeling the Highly Nonlinear variation of the QG model since it encompasses all aspects of the QG model formulation. Of the four variations (Stommel, Munk, inertial, highly nonlinear, defined in Appendix A), the Highly Nonlinear version is the most complex in its behavior and might be expected to be the most difficult variation to model.

^{45,54,55}A sample of the QG stream function time evolution is shown in the bottom of Fig. 5. We note that all QG model solutions used in this study were verified, by monitoring kinetic energy, to be equilibrated in the sense that transients had effectively vanished.

### C. Evaluation of the learned models

We begin evaluation of the learned models by analyzing the differences between the stream function fields. We then evaluate the differences in particle behavior within the actual and estimated stream function fields. Finite-time Lyapunov exponent (FTLE) fields are then the natural extension to the Lagrangian (particle trajectory) approach to understanding the difference in the flows, as we can essentially understand FTLE field analysis to be both an ensemble analysis of particle trajectories as well as a general structural analysis of the flow. Finally, modal structure decomposition comparisons between the actual and estimated stream function fields allow for some understanding of the dimensionality, or complexity, of the estimated field with respect to the actual field. In Secs. IV C 1–IV C 4, each of these techniques will be applied.

#### 1. Stream function comparison

For both DG and QG models, we will analyze the difference between the estimated and actual stream function fields. In this section, we will only discuss the simpler DG model and the Highly Nonlinear variation of the QG model; however, in Appendix B, we include results for the other variations of the QG model. For our purposes, the Highly Nonlinear variation of the QG model is most relevant as it is the most difficult to model and solve numerically.

Qualitatively, Figs. 4 and 5 show that the estimated and actual stream functions exhibit almost identical behavior. The absolute error metrics between the actual and estimated stream function fields vary with the different hyperparameter tunings (training size, reservoir size, and spectral radius) used during the training of the ESN model. Figures 6 and 7 show the gradual increase in the average absolute error between the estimated and actual stream functions for various training sizes. While the QG model examined here represents the most dynamically complex case, a more complete picture of the QG stream function can be found in Appendix B, where additional model parameters are considered, including examples in the Stommel and Munk regimes. Other hyperparameter dependencies are explored in Appendix C.

Training size is specifically highlighted here because total training computation time has a quadratic complexity with respect to training size. Notably, the QG model only requires 1 “period” of training data, approximately 70 time steps, for reasonable results. A model with fewer than 70 time steps fails to replicate the actual stream function; this makes sense as the model never “sees” all the ways the stream function evolves within an entire period.

#### 2. Particle trajectories

While a stream function is itself useful and is closely related to sea surface height (SSH) data, a major area of interest and application is the study of transport; in particular, the dynamics of particles placed within the field. Particle trajectories in time-dependent flows are complex and not generally predictable, even in simple flow fields, as has been appreciated since the early studies of chaotic advection.^{23,53} In this study, we adopt trajectories and the related quantity of FTLE fields as a way to validate the fidelity of our RC model. Let the open set $D\u2282\mathbb{R}2$ be the domain of interest, and the stream function $\psi (x(t),t)$ be a time-dependent scalar field on *D*. Particle trajectories can be calculated for each point $x(t0)\u2208D$ initialized at *t*_{0}, as we consider a flow to consist only of incompressible fluid particles. Each particle's trajectory is governed by a flow map $\varphi t0t0+\Delta t:D\u21a6D$, where

for some time interval $\Delta t$. For the flows studied in this work, the flow map $\varphi t0t0+\Delta t$ is directly dependent on the smooth velocity field $v(x,t)$ on *D*, where **x** is a two-dimensional position vector (*x*, *y*) and where Eq. (12) is integrated forward in time using a fourth-order Runge-Kutta numerical method.^{56}

Measuring the accuracy of the model can be approached in several ways. In practical fields of work and engineering, stream functions are useful for particle trajectory tracking. For this, it is important to remember that the underlying particle dynamics is chaotic and inherently unpredictable by nature, so any attempt to evaluate estimated models should be regarded with respect to the level of chaos in the system.

To account for the inherent chaotic nature of the particle dynamics, estimated particle trajectory deviations are compared to initially perturbed actual particle trajectories over time. Let an arbitrary particle trajectory has an initial condition $x0\u2208D$, so that $x(t,x0)\u2208\mathbb{R}2\xd7\tau $. As the evolution of $x(t,x0)$ depends on the governing stream function, there are two unique particle trajectories defined for each unique initial condition: one defined for an actual stream function and one defined for an estimated stream function. For each initial condition, the two defined particle trajectories are compared over time.

As previously mentioned, it is also important to acknowledge the inherent chaotic nature of the particle dynamics; the estimated particle trajectory's deviation is compared to an initially perturbed actual particle trajectory's deviation over time, which can be loosely thought of as an upper bound on the predictability of the system's particle trajectories. In Figs. 8 (DG) and 10 (QG), the deviation from the true particle trajectory governed by the actual stream function is compared to

a particle trajectory governed by the actual stream function with slightly perturbed initial conditions, and

a particle trajectory governed by the estimated stream function with the same initial conditions.

While these trajectories are not necessarily representative of all initial conditions, they do present some insight on how the particle trajectories based on the learned model track the actual particle trajectories with respect to the inherent chaotic nature of the dynamics.

Figures 8 and 10 show the trajectories found using perturbed initial conditions (left) and found using the actual and learned models (right) for the DG and QG models, respectively. The perturbed trajectory and the estimated particle trajectory both begin to deviate between 500 and 1000 time steps. In other words, trajectories governed by the estimated stream function are approximately equivalent to trajectories governed by the actual stream function with an initial deviation of $10\u22123$ for DG (Fig. 9) and $10\u22122$ for QG (Fig. 11).

It is worth noting in Fig. 10 that the dramatically different final position of the two trajectories is not caused by the actual and estimated streamfunctions being dramatically different, as, in fact, the two streamfunctions are quite close to one another as seen in Fig. 5. Rather, this particular choice of initial condition is such that the particles start essentially on top of an LCS, which serves as a division of the phase space dynamics. Even the small discrepancy between the actual and estimated streamfunction has led to the initial condition lying on different sides of the LCS (with respect to the actual and estimated model), and, thus, over time, the trajectories end up in different basins of attraction. If the initial condition was positioned on the same side of the LCS (for the actual and estimated models), their final positions would be close to one another. As with any deterministic flow, whether or not two trajectories diverge and end up in different basins is a matter of where they start in relation to the LCS.

To obtain a general sense of how particle trajectories deviate with respect to any initial condition in the domain, an average deviation is calculated over time using a large sample of initial conditions. For simplicity, deviation is quantified with the Euclidean norm between particle positions at corresponding time steps. As shown in Figs. 9 (DG) and 11 (QG), the particle trajectories associated with the estimated stream function deviate from the particle trajectories associated with the actual stream function at about the same time as trajectories associated with the actual stream function but with perturbed initial conditions. Note that different initial perturbation amplitudes, namely, $10\u22123$ and $10\u22122$ for the DG and QG models, respectively, were used in order to match the trajectory deviations for the two types of flow.

#### 3. Finite-time Lyapunov exponent field

The computation of finite-time Lyapunov exponents is often used to find coherent structures in fluid flows.^{22,57–59} The FTLE provides a measure of how sensitively the system's future behavior depends on its current state.

We consider a velocity field $v:D\xd7I\u2192D$, with $D\u2282\mathbb{R}2$ that is defined over a time interval $I=[ti,tf]$, and the system of equations

where $x,x0\u2208\mathbb{R}2$ and $t\u2208I$. This dynamical system has quantities known as Lyapunov exponents that measure the growth rates of the linearized dynamics about the trajectory of the system. To find the finite-time Lyapunov exponents (FTLE), the Lyapunov exponents are computed on a restricted finite time interval.

To compute FTLE values, we define the domain of interest *D* as an evenly spaced grid of two-dimensional points with initial position $x0$ defined at the grid points. Then, all points are numerically integrated using Eqs. (13a) and (13b). The flow map $\varphi $ determines the advection of the initial points as follows:^{22,57–59}

Then, the FTLE can be defined as

where $\lambda max(\Delta )$ is the maximum eigenvalue of the right Cauchy–Green deformation tensor Δ, which is given as

with * denoting the adjoint.

For a given $x\u2208\mathbb{R}2$ at initial time *t _{i}*, Eq. (15) gives the maximum finite-time Lyapunov exponent for some finite integration time

*T*(forward or backward) and provides a measure of the sensitivity of a trajectory to small perturbations. The FTLE field given by $\sigma (x,ti,T)$ can be shown to exhibit ridges of local maxima in phase space. The ridges of the field indicate the location of attracting (backward time FTLE field) and repelling (forward time FTLE field) structures. In two-dimensional space, the ridge is a curve which locally maximizes the FTLE field, so that transverse to the ridge one finds the FTLE to be a local maximum.

^{2,22}

Given the two-dimensional grid of points spanning the domain of the flow field, one can compute $[xij(t),yij(t)]$ and $[xij(t+T),yij(t+T)]$, where $(xij(t),yij(t))$ denotes the $(i,j)th$ points in the computational grid, and $[xij(t+T),yij(t+T)]$ denotes the corresponding set of points after they have been advected by the flow for integration time *T*. The spatial gradient of the flow map is approximated as

using central finite differences.

The computations of the FTLE field for the DG and QG flows were performed using a fourth-order Runge-Kutta algorithm. However, the final integration time *T* differed between the two problems since particles in the DG flow never leave the domain *D*, while particles in the QG flow will exit the domain *D* after some period of time. To resolve this issue, the integration time must be limited by some upper bound, which was chosen based on observation of the FTLE and particle fields at various integration times.

Figures 12 and 13 show the forward-time FTLE fields computed using the actual and estimated stream functions fields for the DG and QG models, respectively. The FTLE field based on the estimated, interpolated velocity field qualitatively agrees well with the actual FTLE field. The FTLE fields of the estimated and actual velocity fields for the variations of the QG flows agree qualitatively as well. As the flows become more nonlinear (e.g., Fig. 13), it becomes more difficult to pinpoint the coherent structures. Nevertheless, the same phenomenon is seen in the estimated FTLE field as compared with the actual FTLE field. FTLE fields for Stommel, Munk, and Inertial QG models are shown in Appendix B and can be seen to have more structure than DG FTLE fields, yet have fewer features than the Highly Nonlinear QG case.

#### 4. Modal structure decomposition

We use the proper orthogonal decomposition (POD) method^{60} to generate prominent modal structures (basis functions) and quantify the amount of information extracted from both the estimated and actual stream function data.^{61,62} Again, we are observing the scalar stream function field $\psi (X0,t)$, where $X0\u2282D$ are the grid points uniformly sampled in the domain of interest $D\u2282\mathbb{R}2$. The POD method serves to decompose $\psi (X0,t)$ as a number of optimal (in the *L*^{2} sense) orthogonal basis functions $\varphi i(X0)$ where $i\u2264n$, so that the original stream function field can be reconstructed as the linear combination,

where $\psi \xaf(X0)$ is the temporal mean of the stream function field, *α _{i}* is the temporal coefficient, and the left hand side of Eq. (17) provides the fluctuations from the mean field.

Given the *m *×* n*-dimensional $X0$, then one can construct the data matrix $X\u2208\mathbb{R}mn\xd7\tau \u0302$, where $\tau \u0302$ is the total number of time steps in the prediction phase, by concatenating the flattened snapshot vectors $\psi (X0,t)$. The optimal orthogonal basis functions $\varphi i$ are determined by solving the eigenproblem,

where **R** is the covariance matrix $XXT\u2208\mathbb{R}n\xd7n$ (where *T* denotes transpose). The largest eigenvalues *λ _{i}* correspond to the eigenvectors $\varphi i$ with the most contribution to reconstructing the original stream function field. The results of this decomposition can be used as a visual evaluation of the estimated stream function fields. By decomposing both the actual and estimated stream function fields to their respective optimal basis functions, one can analyze the differences between the different modal structures.

As shown in Fig. 14, the modal structures for the DG model are ordered by variance contribution (i.e., eigenvalue magnitude). The agreement between the actual and estimated modal structures is excellent. Moreover, the first two modes contribute almost all of the variance in both cases. Similar observations can be made for the highly nonlinear variant of the QG model, as shown in Fig. 15. The modal structure of the Stommel, Munk, and inertial cases exhibits a similar close agreement and need not be shown. While all the QG cases include some degree of nonlinearity, two or three modes are dominant, as we expect from this single-layer QG model.^{37} These POD results reflect the fact that the models used here, despite nonlinearity, involve a basic two attractor basin with spatially and temporally periodic driving. The efficacy of POD to capture the dynamics for such systems is expected and now well-known. The trajectories and the FTLE portraits that can be thought of as capturing families of trajectories are more challenging to model because they sample the fractal basin boundaries created by the time-dependent forcing and are, thus, themselves, chaotic. This dichotomy between the simplicity of the flow and the complexity of its trajectories is the central idea first described by Aref^{23} for a time-dependent, two-dimensional Stokes flow.

## V. CONCLUSION

This work explains the methodology, development, and evaluation of a machine learning approach to modeling and prediction of geophysical flows. Reservoir computing's relevance to dynamical systems is introduced, and RC is applied to two well-known ocean circulation models: the simple prescriptive double-gyre stream function model (DG) and the more complicated quasi-geostrophic PDE model (QG).

The effectiveness of this reservoir computing approach to modeling and characterizing ocean circulation models through the stream function is evaluated using a range of quantitative and qualitative measures, including mean stream function error, the predictability of particle trajectories, and comparisons of FTLE and modal signatures. Reservoir computing models formed from QG model data exhibited good predictive power even in these chaotic dynamical systems described by a nonlinear PDE, and over a range of values of its control parameters. Addressing the problems caused by limited datasets, our reservoir computing approach showed that estimated flows derived from a small amount of data are shown to agree well with the actual flows.

Of particular relevance to the work presented here is a number of recent studies that involve machine learning in a quasi-geostrophic context. Bolton and Zanna^{8} demonstrated that ML can deduce unresolved physical processes from sparse ocean data, which correlates with our findings that RC models capture the dynamics of QG models. While Bolton and Zanna used both QG (albeit 3D layered models) and ocean general circulation models to produce data, their ML approach was limited to very computationally expensive convolutional neural networks. The effectiveness of an RC framework over other ML options was validated recently by Chattopadhyay *et al.*^{63} for the Lorenz96 system, both in terms of its short-term predictability and long-term statistics. Also, for highly realistic ocean models (e.g., CESM2), a RC framework has been shown to perform better forecasting than other approaches^{64} including those involving modal analysis. Models of complex physical systems that have predictive capability without demanding perfect data or vast computational resources, such as the RC models explored here, have countless applications. Ocean flows and QG models of them can serve as a framework for climate sub-models, contaminant tracking,^{65} or search and recovery.^{66}

## ACKNOWLEDGMENTS

This work was funded by the National Science Foundation (Award Nos. DMS 1418956, CMMI 1462823, CNS 1625636, CMMI 2121919, and CMMI 2121923). K.Y. and P.Y. are grateful to L. Smith for his valuable feedback, and P.Y. acknowledges K. Helfrich for providing the core QG model code.

## AUTHOR DECLARATIONS

### Conflict of Interest

The authors have no conflicts to disclose.

### Author Contributions

**Kevin Yao:** Conceptualization (equal); Data curation (equal); Formal analysis (equal); Investigation (equal); Methodology (equal); Software (equal); Validation (equal); Visualization (equal); Writing – original draft (equal); Writing – review & editing (equal). **Eric Forgoston:** Conceptualization (equal); Formal analysis (equal); Funding acquisition (equal); Investigation (equal); Supervision (equal); Writing – original draft (equal); Writing – review & editing (equal). **Philip Yecko:** Conceptualization (equal); Formal analysis (equal); Funding acquisition (equal); Investigation (equal); Supervision (equal); Writing – original draft (equal); Writing – review & editing (equal).

## DATA AVAILABILITY

The data that support the findings of this study are available from the corresponding author upon reasonable request.

### APPENDIX A: QUASI-GEOSTROPHIC MODEL

The quasi-geostrophic (QG) ocean circulation model used here is derived in many standard texts. Our formulation is based on that of Pedlosky,^{37,38} which can be consulted for more details.

The single layer (or barotropic) QG model can be formulated using stream function *ψ*, or vorticity, *ξ*, where $\xi =\u22072\psi $, viz

where *f*_{0} is the Coriolis parameter at the central latitude of the gyre, *δ _{E}* is the bottom layer thickness,

*β*is the northward spatial derivative,

*D*is the major layer,

*A*is the turbulent viscosity coefficient, and

_{H}*J*is the Jacobian,

describing the advection of relative vorticity by the motion field. Vertical frictional forces, though weak, contribute to the flow by stretching the planetary vorticity. We let the upper Ekman pumping velocity be

representing a time-dependent pumping velocity, where *w*_{0} is the amplitude of the Ekman pumping, *w _{P}* is the amplitude of the periodic driving perturbations, and

*L*is the characteristic latitudinal length scale. Applying dimensional analysis, we assume the flow is contained in a basin of this characteristic length scale

*L*, has characteristic horizontal velocity scale

*U*, and has characteristic Ekman pumping scale

*w*, so that

_{E}We can non-dimensionalize the stream function *ψ*, the coordinate distances *x* and *y*, and we can scale *t* with $1/\beta L$, to arrive at the form

as given in the main text, Eq. (11), where the non-dimensional parameters

are formulated with respect to the relative length scales of three important boundary layers: $(\delta S),\u2009(\delta M)$, and $(\delta I)$, which are, respectively, named Stommel, Munk, and inertial. Note that in the Stommel formulation, *κ* is the coefficient associated with bottom drag. The corresponding non-dimensional parameters, $\mu ,\lambda ,$ and *ε*, capture the relative importance of bottom friction, lateral diffusion, and nonlinearity.

The flow is contained in a basin of characteristic scale *L*; hence, each of the non-dimensional parameters is calculated based on ratios of the boundary length scales to the characteristic length *L*. In general, we associate *L* with the scale of the motion in the flow, though we expect small-scale motions specifically in the western boundary current.

The Reynolds number, Re, may be defined as the ratio of the inertial advection of relative vorticity to the diffusion of vorticity, i.e.

where the second equality holds when the boundary layer is viscous and has the scale of the Munk layer.^{38} When Re > 1, nonlinearity influences the flow around the boundary layer, and when Re < 1, linear viscous physics dominates the boundary layer flow.

There are no closed form solutions for QG models with the exception of a pure Stommel case where $\lambda =\epsilon =0$. The Stommel, Munk, and inertial monikers are common, and Pedlosky's review^{38} can be consulted for more details. What we refer to as Highly Nonlinear is a special case of Inertial with a relatively large amplitude driving; gyres of this model are, thus, highly spatially asymmetric and aperiodic. The main text contains results of our machine learning framework for this model because it exhibits the most nonlinear and irregular behavior. Results for the other variants can be found in Appendix B. The four models and their length scale values used for computation are outlined in Table I.

QG model variations . | |||||
---|---|---|---|---|---|

. | δ
. _{S} | δ
. _{M} | δ
. _{I} | w
. _{p} | Re . |

Stommel | 0.03 | ⃛ | 0.02 | 0.3 | >1 |

Munk | 0.04 | 0.03 | 0.02 | 0.2 | <1 |

Inertial | 0.04 | 0.03 | 0.05 | 0.2 | >1 |

Highly nonlinear | 0.01 | 0.03 | 0.05 | 0.5 | >1 |

QG model variations . | |||||
---|---|---|---|---|---|

. | δ
. _{S} | δ
. _{M} | δ
. _{I} | w
. _{p} | Re . |

Stommel | 0.03 | ⃛ | 0.02 | 0.3 | >1 |

Munk | 0.04 | 0.03 | 0.02 | 0.2 | <1 |

Inertial | 0.04 | 0.03 | 0.05 | 0.2 | >1 |

Highly nonlinear | 0.01 | 0.03 | 0.05 | 0.5 | >1 |

### APPENDIX B: RESULTS FOR QUASI-GEOSTROPHIC VARIATIONS

This appendix highlights the various results for the Stommel, Munk, and inertial variations of the QG model. Figures 16, 17, and 18 compare the estimated and actual stream functions for the Stommel, Munk, and Inertial variants, respectively. Figures 19, 20, and 21 compare the estimated and actual FTLE fields for the Stommel, Munk, and inertial variants, respectively.

### APPENDIX C: ECHO STATE NETWORK HYPERPARAMETER STUDIES

This appendix presents the impact of the values of the principal hyperparameters, namely, reservoir size and spectral radius, on the error of the estimated stream function relative to the actual stream function over time.

The DG model performs reasonably well for reservoir sizes of 4000, 5000, or 6000 nodes for short training times. For longer training times, a reservoir size of 5000 nodes gives less error, but we were not able to identify this reservoir size dependence. Nevertheless, the stream function error remains small, 1%–2%, at all reservoir sizes (Fig. 22). The QG model is found to perform well around 4000 nodes, as shown by the comparisons of models with 4000, 5000, and 6000 node reservoirs shown in Fig. 23. Different hyperparameter tunings are model-dependent, as expected. It is important to note that the adjacency matrix representing the node connections in the reservoir is sparse, so although there is a high number of nodes, there are not necessarily as many connections. On the other hand, the complexity of the reservoir still increases with the additional node count, which could contribute positively to modeling capability.

With regard to spectral radius, the DG model performs well with a spectral radius of approximately 2.3 or less (Fig. 24), while the QG model performs well with a value in the range $0.4\u22120.6$ (Fig. 25). Spectral radius is a vague proxy to long-term memory required by the model to learn, so this result suggests that the QG model evolves based on shorter-term history.