The recent explosion in the availability of echosounder data from diverse ocean platforms has created unprecedented opportunities to observe the marine ecosystems at broad scales. However, the critical lack of methods capable of automatically discovering and summarizing prominent spatio-temporal echogram structures has limited the effective and wider use of these rich datasets. To address this challenge, a data-driven methodology is developed based on matrix decomposition that builds compact representation of long-term echosounder time series using intrinsic features in the data. In a two-stage approach, noisy outliers are first removed from the data by principal component pursuit, then a temporally smooth nonnegative matrix factorization is employed to automatically discover a small number of distinct daily echogram patterns, whose time-varying linear combination (activation) reconstructs the dominant echogram structures. This low-rank representation provides biological information that is more tractable and interpretable than the original data, and is suitable for visualization and systematic analysis with other ocean variables. Unlike existing methods that rely on fixed, handcrafted rules, this unsupervised machine learning approach is well-suited for extracting information from data collected from unfamiliar or rapidly changing ecosystems. This work forms the basis for constructing robust time series analytics for large-scale, acoustics-based biological observation in the ocean.

## I. INTRODUCTION

Sound is extensively used to study life in the ocean (Medwin and Clay, 1998). Compared to net-based sampling, which can only be conducted at discrete times and locations, echosounders boast the capability to “connect the dots” by delivering continuous active acoustic observation across time and space. This advantage has made them an indispensable tool in modern ecological and fisheries studies, in particular for collecting information about mid-trophic level organisms that are otherwise difficult to observe effectively at large scale (Benoit-Bird and Lawson, 2016; Handegard *et al.*, 2013).

Technological advancements have resulted in a deluge of echosounder data in the past decade, thanks to the rapid and successful integration of autonomous echosounders with a wide variety of ocean observing platforms, including moorings and autonomous surface and underwater vehicles (De Robertis *et al.*, 2018; Dunlop *et al.*, 2018; Greene *et al.*, 2014; Moline *et al.*, 2015; Suberg *et al.*, 2014). The significantly broader spatial and temporal coverage of the datasets provides an unprecedented opportunity to study the response of the marine ecosystems to climate variability at scales never before possible. An example is the network of six moored echosounders maintained by the Ocean Observatories Initiative (OOI) along the coast of the Northeast Pacific Ocean, which have been delivering data continuously since 2015 [Fig. 1(A)]. Another example is the experimental fisheries survey carried out by a fleet of autonomous surface vehicles sampling the west coast of the U.S. and Canada simultaneously in the summer of 2018–2019 (Chu *et al.*, 2019). However, the massive volume and complexity of these new data have overwhelmed the conventional echosounder data analysis pipelines that rely heavily on manual processing, thwarting rapid progress in marine ecological research.

A critical need in the analysis of large echosounder datasets is the automatic extraction of biological information, such as organism identity, biomass, and high-level spatio-temporal distribution patterns of organisms, without excessive human intervention. Currently, the majority of echo data processing procedures require visual inspection and labeling of echograms [image formed by echo returns, e.g., Fig. 2(A)] by human experts (Fleischer *et al.*, 2012), prompting not only scalability issues with increasing data volume but also subjective bias and reproducibility challenges.

Many echo classification methods were proposed to automate the procedure of discerning the taxonomic identity of acoustically observed biological aggregations. A key feature commonly exploited is the relative strength of echo across sonar frequency, which varies significantly as a function of the size, shape, orientation, and material properties of marine animals (Benoit-Bird and Lawson, 2016; Medwin and Clay, 1998). These include classification rules derived using empirical echo measurements and physics-based scattering models, such as the popular frequency-differencing and related Bayesian inference methods (e.g., De Robertis *et al.*, 2010; Jech and Michaels, 2006; Korneliussen *et al.*, 2016), as well as supervised and semi-supervised machine learning methods, such as random forests and neural networks (e.g., Brautaset *et al.*, 2020; Fallon *et al.*, 2016; Haralabous and Georgakarakos, 1996; Woillez *et al.*, 2012). However, these classification methods can be ill-fitted for large-scale echosounder data collected on autonomous platforms, due to challenges associated with collecting concurrent biological ground truth and calibration information at a scale comparable to that of the acoustic data. For example, extensive net- or optics-based biological taxa identification samples are critical in constructing empirical reference or training datasets for scatterer classification, but the cost and logistical complexity involved in collecting them in parallel with autonomous echosounder sampling is often prohibitive, especially in geographically remote areas or deep water. The validity and applicability of any derived classification rules further hinge on the assumed invariance of the combination of measurement systems and the biological community, both of which can vary significantly across time and space. In addition, accurate echosounder calibration—the mapping between the received voltage signal and the actual echo strength in physical units (Demer *et al.*, 2015; Haris *et al.*, 2018)—imposes a stringent requirement for adequate application of these methods, as calibration needs to be conducted separately for each frequency under a wide range of environmental conditions (e.g., at depth) under which acoustic data are collected.

Another suite of methods were developed with a goal of capturing specific biological activity patterns observed in the echogram. These methods seek to capture local echogram features through summary statistics derived from a vertical echogram slice or across a few immediately adjacent slices. The features include integrated echo energy (as a proxy for abundance), the vertical spread of animal distribution, timing and speed of animals engaged in vertical migration, and the number and depth of strong scattering layers, etc. (e.g., Cade and Benoit-Bird, 2014; Klevjer *et al.*, 2016; Proud *et al.*, 2015; Urmy *et al.*, 2012; Woillez *et al.*, 2007). While useful in summarizing echogram variations, these statistics are handcrafted and not adaptive, risking loss of information or poor characterization when the pre-determined echogram features mismatch those of the actual data. The summary statistics also lack the capability to automatically extract and describe high-level echogram features since the computation involves only temporally and spatially localized measurements in large datasets. A recent study circumvented some of these difficulties by employing a time-augmented extension of the empirical orthogonal function method to examine the anomaly of zooplankton diel vertical migration (DVM) behavior with respect to monthly means (Parra *et al.*, 2019). The anomaly patterns were subsequently linked to factors such as ocean currents, lunar variability, and the composition of the local biological community.

Rather than focusing on specific organisms or localized echogram features, the overarching goal of our work is to develop a methodology that can automatically build a comprehensive representation of high-level spatio-temporal structures intrinsic to the echosounder data, such as temporally transient appearance of scattering layers or animal aggregations engaged in different vertical movement behaviors. These structures are the most prominent and immediate features human observers attend to in visual analysis of large echograms, and are key to forming and testing hypotheses pertinent to organism responses to environmental disturbance of natural or anthropogenic sources.

In this paper, we show that matrix decompositions are powerful techniques capable of achieving this goal in an unsupervised manner with minimal assumption and prior knowledge from the observed region. We show how different decomposition formulations are suitable for exploiting and extracting different structures in the data: (1) principal component pursuit (PCP) is a robust version of principal component analysis (PCA) capable of exploiting the latent regularities in long-term echograms to automatically remove spurious and noisy echo outliers while retaining the high-level spatial and temporal patterns present in the original time series; (2) temporally smooth nonnegative matrix factorization (tsNMF) is a decomposition with nonnegativity and temporal smoothness constraints that successfully extracts distinct daily echogram patterns with time-varying activation sequences, providing a compact representation of temporal processes embedded in the data. The utility of these methods is demonstrated using a three-frequency echosounder time series spanning 62 days. While no specific constraints are incorporated into the decomposition formulation in this paper to account for multi-frequency echo features, we discuss preliminary results from a recent work (Lee and Staneva, 2019) in which we investigated how these features may be exploited through nonnegative tensor decomposition, a multi-way extension of nonnegative matrix factorization (NMF), as well as other avenues for future development.

Our contribution builds on the classic concept of dimensionality reduction to find structures directly from the data and use them for representation and interpretation. In contrast to previous methods that depend strongly on past experience and observer assumptions, the unsupervised data-driven approach we employ is adaptive to intrinsic properties of the data. Such a capability is pivotal in extracting and synthesizing information from the overwhelming volumes of echosounder data collected from not only unfamiliar ecosystems but also those undergoing rapid changes in the changing climate.

## II. MATERIALS AND METHODS

### A. Acoustic and environmental data

We use 62 days of acoustic data (August 17–October 17, 2015) from an upward-looking echosounder (Simrad EK60) mounted on a moored underwater platform at a depth of approximately 200 m. This echosounder is part of the Oregon Offshore Cabled Shallow Profiler Mooring (OOI CE04OSPS; OOI, 2015b) of the OOI Coastal Endurance Array [Fig. 1(A)], on which data are collected continuously and become immediately available over the Internet. The full water depth at this site is 588 m.

Throughout the selected observation period, narrowband sonar pulses (gated sinusoidal waves with a duration of 1.024 ms) at 38, 120, and 200 kHz were transmitted simultaneously at a 1-s pinging interval for the first 20 min of each hour. The volume of the selected echosounder dataset (15.6 GB) is comparable to those from a few days of typical ship-based echosounder survey. The echogram contains distinctive patterns that are easily discerned by eye, facilitating interpretation of the results from matrix decomposition [Fig. 2(A)].

The raw data collected by the echosounder are preprocessed using the Python package echopype (Lee *et al.*, 2020) prior to the decomposition analyses. This is a free and open-source software package developed by the authors to allow convenient conversion of manufacturer-specific binary data files into interoperable netCDF or zarr formats. These formats are suitable for storing and computing multi-dimensional data, such as those collected by scientific echosounders. The preprocessing steps include: (1) parsing and saving the.raw binary data files to self-described machine-readable netCDF files, (2) performing calibration to obtain volume backscattering strength ($SV$, units: dB re 1 m^{−1}) for each echo sample (each ping) (Simmonds and MacLennan, 2007), and (3) obtaining mean volume backscattering strength (MVBS, or $SV\xaf$) over non-overlapping echogram regions with a depth bin size of 5 m and a temporal bin size of 200 s. Since this echosounder was not formally calibrated during this deployment, we use manufacturer-supplied parameters stored in the data files to calibrate the echo measurements.

In addition to data from the echosounder, ocean current measurements collected by an acoustic Doppler current profiler (ADCP) from the OOI Oregon Offshore Cabled Benthic Experiment Package (OOI CE04OSBP; OOI, 2015a) are also obtained for joint analysis of the decomposition results.

### B. Matrix decomposition

#### 1. Overview

Matrix decomposition (or factorization) methods seek to represent complex observations through a small set of latent factors in an unsupervised manner while preserving intrinsic structures in the data (Bouwmans *et al.*, 2016; Cichocki *et al.*, 2009). These methods leverage the hidden low-dimensional nature of high-dimensional observations and yield latent factors that are usually more tractable and interpretable compared to the original data. These techniques have found wide applications in many scientific and applied fields, including computer vision (Bouwmans and Zahzah, 2014; Cichocki *et al.*, 2009; Lee and Seung, 1999), neuroscience (Chen and Cichocki, 2005; Pnevmatikakis *et al.*, 2016), text mining (Ding *et al.*, 2008), recommender systems (Koren *et al.*, 2009; Zhang *et al.*, 2006), deep learning(Oseledets, 2011), etc. A notable and illustrative application example is the analysis of human faces using PCA, which produces a series of “eigenfaces” that capture increasingly sophisticated variation of facial features across subjects (Turk and Pentland, 1991).

Using matrix decomposition methods, here we aim to automatically extract high-level spatio-temporal structures embedded in the echogram of each day and use these patterns as descriptors for temporal changes in the time series. The choice of the daily interval is motivated by the profound influence of daylight cycle on life in the ocean and the significant contribution of diel vertical movements of marine animals in the global biogeochemical cycle (Hays, 2003; Lehodey *et al.*, 2010). Below we describe the data restructuring procedure and the decomposition formulations used in this study to remove noisy outliers from data, discover distinct echogram patterns, and summarize variations in long-term echosounder time series.

#### 2. Restructuring echo data for decomposition

The multi-frequency echosounder time series are restructured to allow two-way (matrix) decomposition in this paper. All echo observations from a single frequency of each day are first “flattened” and concatenated to form a long vector that contains all echogram pixels within the day. The long vectors from all days in the observation period are then combined horizontally along the day dimension to form a matrix [Fig. 3(A)] for each frequency. The matrices from all frequencies are then stacked vertically to form the final data matrix for decomposition [Fig. 3(B)]. Each single-frequency daily echogram is of dimension $Ndepth\xd7Nping$, where *N _{depth}* (=37) is the number of averaged depth bins and

*N*(=144) is the number of averaged ping bins of the MVBS data within a day. Therefore, the “flattened” long vectors assembled from daily observations are of dimension $NdepthNping\xd71$. The final data matrix is of dimension $NdepthNpingNfreq\xd7Nday$, where

_{ping}*N*(=3) is the number of frequencies and

_{freq}*N*(=62) is the number of days. In addition, all decompositions are performed on the log-transformed MVBS data, due to the well-known challenge of decomposing data with a large dynamic range, such as those commonly encountered in machine listening applications (Liutkus

_{day}*et al.*, 2015; see Sec. IV for more discussion).

#### 3. Outlier removal

A profiler collocated with the echosounder on the same mooring traversed the water column at irregular time intervals during the observation period and introduced strong interference [indicated by arrows in Fig. 2(A)] to echoes from natural sources. The profiler echo interference needs to be removed from the time series before decomposition analysis to avoid their dominance in the optimization updates. To this end, we employ PCP, or robust PCA (Candes *et al.*, 2009), which was designed to separate the data (**X**) into an exact sum of a low-rank (**L**) and a sparse (**S**) component, i.e., $X=L+S$. The development of PCP was driven by the limitations of PCA when dealing with data containing significantly corrupted observations, for which the estimated principal components are easily dominated by the corrupted samples and deviate significantly from the true low-rank structures. The decomposition is obtained by solving the following minimization problem:

where $||\xb7||*$ is the nuclear norm of a matrix, which is the sum of its singular values, and $||\xb7||1$ is the *l*_{1} matrix norm, which is equal to the maximum of the *l*_{1}-norms of its columns ($||S||1=maxj\u2211i|sij|$).

Compared with the matrix completion problem where the goal is to recover missing entries in the data matrix, PCP handles a similar but broader case in which the locations of the corrupted observations are not known. While one expects some trade-off between the minimization of the nuclear-norm term, which looks for lower dimensional structures, and the *l*_{1}-norm, which looks for a sparse component, under weak conditions it has been shown by Candes *et al.* (2009) that such decomposition can be obtained exactly, and, surprisingly, the value of *γ* can be set to $1/max(dim(X))$, making the algorithm parameter-free. While in theory, such an exact decomposition may not exist, in practice it has been demonstrated that PCP indeed provides robust decomposition. For example, even though images of faces under different illuminations do not lie in a perfectly low-dimensional space, PCP has been used to recover faces from shadows and occlusions Candes *et al.* (2009); similarly, while the background in videos is filled with noise and clutter, PCP has been very successful as a background subtraction technique and has found many applications in video surveillance (Bouwmans and Zahzah, 2014). It has also been proven that the problem can be solved almost exactly in the presence of small noise (Zhou *et al.*, 2010). The theoretical and practical properties of PCP provide a powerful framework to separate the (sparse) corrupted samples and enable analyses focused on the underlying low-rank structures.

Here we use PCP as a purely data-driven approach to isolate abrupt patches of strong echoes from the profiler and fine echogram granularity (the sparse component) from high-level spatio-temporal structures in the echogram (the low-rank component), which are the focus of subsequent analyses. The decomposition was performed on the restructured MVBS matrix described in Sec. II B 2. The PCP decomposition is performed using the Python package RobustPCA (Chi, 2018).

Beyond the scope of this paper, PCP may be useful in removing electric or acoustic interference commonly observed between unsynchronized active acoustic instruments on ships (e.g., ADCP and echosounders) and in isolating echoes from dense schools of pelagic fish that tend to be spatially and temporally localized on the echogram.

#### 4. Additive representation of temporal processes in a long-term echogram

NMF is a data decomposition alternative to PCA for which both the latent factors and their activation coefficients are constrained to be nonnegative (Lee and Seung, 1999; Paatero and Tapper, 1994). The nonnegativity constraints make the latent factors (components) purely *additive* building blocks that can be combined to reconstruct the data. Moreover, the components tend to contain primarily localized features, which underlie a more interpretable “part-based” reconstruction and representation. The advantage is readily demonstrated by the decomposition of human faces into parts associated with eyes, mouths, etc., shown in Lee and Seung (1999), as opposed to the eigenfaces from PCA that each contains variations of different parts of the entire face and can sometimes be challenging to interpret.

A common formulation of NMF aims to find low-rank nonnegative matrices **W** and **H**, such that their product minimizes the squared reconstruction error given by the Frobenius norm,

where **W** is of dimension *D* ×* K*, **H** is of dimension *K* ×* T*, $K\u226aD$ is the rank of the decomposition which needs to be predetermined, and $Wd,k$ and $Hk,t$ are the individual entries of the corresponding arrays. We refer to this formulation as the “traditional NMF” hereafter. Many variations of this problem exist (Cichocki *et al.*, 2009). For example, additional sparsity and smoothness constraints can be imposed; the Kullback-Leibler divergence, which provides a probabilistic interpretation of the problem (Yang *et al.*, 2011), can be minimized instead of the Frobenius norm of the reconstruction error.

In this paper, we use a regularized tsNMF formulation to discover patterns (**W**) embedded in the low-rank MVBS echogram and use the activation of these patterns (**H**) to describe temporal variations in the echosounder time series. tsNMF modifies the traditional NMF by incorporating in its formulation a temporal smoothness constraint, which controls abrupt changes in the pattern activation sequences and emphasizes the role of sequential information in the decomposition. This is achieved by adding a Tikhonov regularization term of the form $\u2211k\u2211t(Hk,t\u2212Hk,t+1)2$, which serves as a discrete equivalent of the squared norm of the derivative of the temporal process. We optimize the following cost function in tsNMF as proposed by Fabregat *et al.* (2019):

where $\Delta $ is the difference operator

*η* controls the smoothness of **H**, *λ* allows tuning for sparsity in **W**, $\beta W$ and $\beta H$ prevent matrices **W** and **H** from growing too large, and *d*, *k*, *t* are indices of the matrix elements. We note that when all parameters *η*, *λ*, $\beta W$, and $\beta H$ are set to zero, we obtain the traditional NMF, which simply minimizes the reconstruction cost subject to the nonnegativity constraints. To understand the importance of the temporal constraint, note that in the traditional NMF a permutation of **X** along the dimension *T* would not change the result.

In our context of sonar data analysis, **X** is the low-rank MVBS echogram **L** from the PCP decomposition (with dimensions $D=NdepthNpingNfreq$ and $T=Nday$), **W** contains the latent factors (patterns), **H** contains the patterns' activation coefficients, and the rank *K* (which equals the number of patterns) is selected based on the procedure described in Sec. II B 5. Specifically, column $wk$ in **W** encodes a particular daily echogram pattern, with the corresponding row $hk$ in **H** capturing the activation of this pattern in the observation period [Fig. 3(B)]. The summation of the outer products of the *K* pairs of $wk$ and $hk$ reconstructs the low-rank MVBS data.

Since values in the MVBS data matrix are negative in absolute physical units, we perform nonnegative decomposition by first subtracting the minimum of the entire matrix and adding the minimal value back for reconstruction. This approach is justified by our primary focus on extracting high-level patterns in the data, rather than physics-based inversion of scatterer numerical density (see Sec. IV for more discussion).

The optimization is solved by a proximal alternating linearized minimization (PALM) algorithm, which was proposed by Bolte *et al.* (2014) as an algorithm with favorable convergent properties for a wide class of constrained matrix decomposition problems and later derived by Fabregat *et al.* (2019) for the tsNMF formulation presented here. We implemented this method in the Python package time-series-nmf (Staneva and Lee, 2020).

#### 5. Selection of decomposition parameters

In this work we select the rank, regularization parameters, and initialization of **H** and **W** in tsNMF based on empirical analysis of the decomposition outcome. Finding the intrinsic low dimension of a data set has long been a challenging task for many unsupervised techniques, including clustering (Mirkin, 2011). We use the variation of the mean squared error (MSE) of the reconstruction (Frigyesi and Höglund, 2008; Hutchins *et al.*, 2008) and the cophenetic correlation coefficient (Brunet *et al.*, 2004) across increasing rank as guides for rank selection. The former provides an indication of the rank beyond which no significant non-random structures in the data are captured by the decomposition, and the latter provides a measure for decomposition stability across multiple randomly initialized runs. Due to the requirement to assign a given observation day to only one daily echogram pattern in the calculation of the cophenetic correlation coefficient, which conflicts with the fact that it is the joint activation of multiple patterns which reconstructs the data, here we base our rank selection primarily on MSE variation and use the variation of cophenetic correlation coefficient only as a reference. The smoothness parameter *η* is selected based on the L-curve concept (Mirzal, 2014; Oraintara *et al.*, 2000), which seeks to balance the reconstruction error and the smoothness regularization cost in the decomposition. Details of these metrics are described in the Appendix.

We found that the exact values of *λ*, $\beta H$, and $\beta W$ do not impact the decomposition results dramatically, as long as the values are chosen within a reasonable range such that the respective terms do not entirely dominate the cost and diminish the contributions of the reconstruction and the smoothness regularization terms. Specifically, increasing *λ* does make the resulting components **W** sparser (which in turn causes the activations **H** to be smoother), but does not affect the primary structures in **H** and **W**. Therefore, we set *λ* = 0 when presenting our results for better interpretability. Fabregat *et al.* (2019) suggested setting $\beta H$ and $\beta W$ to arbitrary positive numbers and used 0.1 in their demonstration. However, we opted for using $\beta H=\beta W=0$ as we found no practical differences for $0\u2264\beta H,\beta W\u22720.1$.

The tsNMF optimization iteration stops when the rate of cost decrease is smaller than 0.5% of the running average of the rate of cost decrease over a window of five iterations. The components **W** and activations **H** shown in all figures are obtained by running 320 randomly initialized decompositions and selecting the run with the lowest representative cost.

### C. Summarizing variations in a long-term echosounder time series

The activation of daily echogram patterns from matrix decomposition provides a compact representation that is useful for summarizing temporal processes in a long-term echosounder time series. To assess changes in the overall echogram structure, we compute the Euclidean distance between the pattern activation for pairs of days and perform hierarchical agglomerative clustering using Ward's minimum variance method, in which clusters are merged based on the criterion of minimizing the sum of squared differences within all clusters (Ward, 1963). The distance matrix and clustering outcome allow convenient identification of the transitions of overall echogram structure throughout the observation period.

## III. RESULTS

### A. Outlier-free low-rank echogram

PCP successfully decomposes the MVBS echogram into a low-rank component that contains the high-level spatio-temporal features in the data [Fig. 2(B)] and a sparse component consisting of abrupt echo patches and fine-grained details (Fig. S1).^{1} The low-rank component is significantly “cleaner” compared to the original MVBS echogram while still retains the high-level structures we aim to capture and describe. The dominant features in the sparse component match the echo traces of the profiler, which corrupt the original MVBS time series. This is achieved without prior knowledge of the profiler movement schedule, which was in fact irregular during the observation period due to maintenance needs. In addition, faint DVM patterns are observed in the background of the sparse component. This is likely due to the fact that there are daily variations in the high-level patterns (which are themselves sparse) as well as noise, i.e., the MVBS data are not a perfectly identifiable sum of **L** and **S**. Importantly, recall that there are no parameters to tune in the PCP algorithm (Sec. II B 3) and thus we have traded some flexibility in the decomposition for the stability of the implementation, which is needed for long-term studies where little is known about the properties of individual processes and the distributions of noise and outliers. If PCP is to be used as a tool for removing noise from calibrated echosounder data with an objective of biomass estimation, further investigation into its algorithmic implementation is needed to ensure that the sparse daily variabilities in the data can be accounted for in the downstream processing.

### B. Daily echogram patterns as building blocks of long time series

The low-rank MVBS echogram from PCP is decomposed by tsNMF (rank = 3, *η* = 500 000) into three distinct daily echogram patterns with corresponding activation sequences throughout the observation period (Fig. 4). Three is the lowest rank at which the decomposition produces clearly distinguishable daily echogram patterns and at the same time captures the majority of the structures in the data (for additional details see the Appendix and Fig. S2).^{1}

The three daily echogram patterns collectively represent a repertoire of daily echo energy distributions that jointly reconstruct the low-rank MVBS time series, demonstrating the “parts-based representation” property of nonnegative decomposition (see Sec. II B 4). Pattern #1 resembles the DVM behavior widely observed in the global ocean (Hays, 2003). The trend of increasing echo strength with increasing frequency in this pattern is consistent with the echo spectral feature of zooplankton-like scatterers at the observation frequencies. The activation sequence of pattern #1 indicates relatively high contribution throughout the observation period, with enhanced activity between early September to early October, which corresponds to visibly intensified DVM during the same time period in the MVBS echogram.

Pattern #2 consists of an intense sub-surface layer and a fainter DVM-like sub-pattern at a shallower depth than that in pattern #1. The two sub-patterns exhibit different frequency-dependent trends on echo strength: the echo strength variation of the sub-surface layer has a weak positive slope during the night time hours and a weak negative slope during the day time hours (toward the outer and middle vertical section of the pattern, respectively), whereas the DVM-like sub-pattern exhibits a clear positive slope with increasing frequency. The activation of Pattern #2 captures the strong sub-surface layer and its gradual disappearance in the first one-third of the MVBS echogram. However, the gradual sinking and dissipation of the sub-surface layer cannot be completely captured by the current decomposition model, which assumes fixed daily echogram patterns that do not change much over time (see Sec. IV for more discussion). We further observe that some echogram features are best represented jointly by two patterns. For example, the gradual depth variation of the DVM in late August [highlighted by the dashed box in Fig. 2(B)] is captured by the combination of pattern #1 and the DVM-like sub-pattern in pattern #2, and the positive slope of echo strength variation across frequency for the sub-surface layer requires both pattern #1 and #2 to reconstruct.

Pattern #3 represents an aggregation of scatterers that appear in the mid to upper water column during daytime, with a trend of decreasing echo strength over increasing frequency that is consistent with fish-like scatterers. While its activation is more difficult to discern than the other two patterns, its inactivation produces noticeable changes on the echogram: the mid to upper water column appears significantly “emptier,” especially at 38 kHz, when pattern #3 activation is low around August 24 and from late September to early October.

### C. Temporal changes of echogram structure and the environment

The distance matrix calculated based on the activation strengths of daily echogram patterns provides a summarization of the multiple transitions of overall echogram structure during the 62-day observation period [Fig. 5(A)]. Along the matrix diagonal, groups of consecutive days that share similar co-activation of echogram patterns are clearly identifiable as reddish “blocks.” The off-diagonal entries show similarity between temporally more distant observation days and reveal recurring co-activation of the daily echogram patterns. The echogram structural changes are also illustrated by the multiple transitions between four clusters of pattern co-activation identified by agglomerative clustering based on pairwise distance between days [Figs. 5(b) and Fig. S3].^{1} Pairing the distance matrix and the cluster transition diagram with ADCP measurements further reveals interesting coincidence at multiple time points between overall echogram structural changes and prominent direction reversal or magnitude variation of ocean currents at the upper 300 m of the water column [indicated by the arrows and dash lines in Fig. 5(C)]. For example, the pattern activation cluster transitions on September 3, 10, 22, and 24 coincide with direction reversal in one or both of the eastward and northward components of the currents. The cluster transitions on August 25 and October 7 additionally coincide with an increase of current velocity in the lower and upper water column, respectively.

## IV. DISCUSSION

In this paper, we present automatic extraction and compact representation of temporal processes in a long-term echosounder time series via matrix decomposition. We show that unsupervised decomposition techniques are capable of transforming complex echo observation into low-dimensional components that are more tractable and interpretable than the original data. The decomposition methods automatically remove noisy outliers and capture high-level spatio-temporal echogram structures through time-varying linear combination of a small number of ecologically relevant daily echogram patterns. The implication of our results is profound: this approach is entirely data-driven and does not assume knowledge of the type of scatterers or sources of interference present in the data. Such an approach is fundamentally different from previous echo analysis methods that rely heavily on handcrafted rules built from past experience, limited biological ground truth information, and observer assumptions. The methodology we present here is therefore more flexible and adaptable when applied to data collected from geographical regions or environmental conditions for which scarce prior information exists. This type of data is now prevalent due to the broad availability of echosounder-equipped autonomous ocean observing platforms and as a result of the rapidly changing climate.

Our results show that the parameter-free, automatic PCP decomposition significantly reduces the dimensionality of the MVBS echogram (Fig. 2) and enables the subsequent tsNMF decomposition analysis. By restructuring data according to the daily interval (Fig. 3), we exploit the intrinsic regularity in the data through PCP to remove the echo interference from the water column profiler. Without this crucial step, the profiler echo traces would have dominated the tsNMF optimization updates due to their significant echo strengths. In addition, while much fine echogram granularity is assigned to the sparse component by PCP, the low-rank MVBS echogram retains the high-level spatio-temporal structures intrinsic to the original data that are important for joint analysis of echosounder observation with other ocean environmental parameters.

The results also show that a temporally regularized nonnegative decomposition through tsNMF provides a tractable and interpretable dissection of the echosounder time series (Fig. 4). The decomposition discovers daily echogram patterns that each represents a distribution of echo energy across time (hour of day) and space (depth), with varying daily activation throughout the observation period. For biological scatterers, such grouping is based on *behavior* (vertical movements within a day) and does not necessarily correspond to specific taxonomic group of organisms that have been the focus of many echo classification methods (see Sec. I). This behavioral binding can be ecologically significant since spatial and temporal co-occurrence of organisms within daily movements suggests trophic functional grouping that mediates energy transfer throughout the water column (Lehodey *et al.*, 2010).

### A. Modeling frequency domain information

Interestingly, while frequency domain information is not explicitly modeled in tsNMF, the constraint on temporal smoothness of pattern activation encourages biological (and therefore spectral) coherence within each pattern, since movements of animal aggregations are continuous in time and space. Recall that spectral characteristics of echoes vary significantly depending on the size, shape, orientation, and material properties of the scatterer. In a preliminary study (Lee and Staneva, 2019) we examined the effects of explicitly modeling frequency domain information via tensor decomposition, which extends matrix decomposition to a multi-way component analysis. While the nonnegative tensor decomposition yielded three daily echogram patterns that resemble the general distribution of echo energy in pattern #1 and #2 discovered by tsNMF here, the spectral dependency within each pattern is different. Further investigation revealed that the reconstruction error is substantially higher for tensor decomposition, rendering its results less reliable and less interpretable. This is likely due to the combination of (1) the restrictive form of Kruskal tensor decomposition we used, which requires coherent variation across frequency for *all pixels* within a single pattern but at the same time does not allow interaction across patterns in the reconstruction (Cichocki *et al.*, 2009), and (2) the small number of frequency features (*N _{f}* = 3) in the dataset, which limits the total number of patterns (rank) allowed in the decomposition. Future investigations using data with richer spectral features, such as those collected by broadband echosounders, would be more appropriate for evaluating the utility of tensor decomposition in modeling frequency domain information in the decomposition.

### B. Relation to environmental data

The pattern activation sequences from tsNMF and the derived distance matrix summarize both similarities and changes in the echosounder time series, providing an explicit avenue for joint analysis of acoustic observation and ocean environmental data. By aligning ocean currents measured by ADCP with the distance matrix and activation clusters, a clear pattern emerges that associates changes in the echogram structure to changes to the direction or magnitude of ocean currents at multiple time points throughout the observation period [indicated by arrows in Fig. 5(C)]. These associations are ecologically plausible, since mid-trophic animals can be advected by strong currents, and their vertical movement behaviours may also be modulated by changes in the ambient environment. While detailed interpretation of these observations is beyond the scope of this paper, we note that strong currents (up to ∼0.2 m/s) could be biologically significant in this region (Keister *et al.*, 2009; Wu *et al.*, 2014). In addition, even though conventional echogram summary statistics may show association with a subset of the observed current changes (e.g., Fig. S4^{1}), the statistics do not provide an easily interpretable summarization of the actual echogram structure in the same way the tsNMF patterns and activations do. Unlike the fixed, handcrafted definitions of the summary statistics, our *data-driven* methodology further ensures that the automatically extracted descriptors (the patterns) change adaptively with intrinsic structures in the data. Using an example dataset consisting of 62 days of observation, we illustrate the synoptic summarizing power of unsupervised decomposition that will likely play a crucial role in extracting information from much longer time series that would be difficult to analyze manually.

### C. Comparison with PCA

Our results demonstrate the advantage of interpretability from imposing nonnegative constraints in decomposition analysis of echosounder data. In comparison with PCA, the purely additive linear combination of low-rank patterns makes it straightforward to attribute changes in the long-term echogram to temporally varying contributions from different daily echogram patterns. For example, while the appearance of the first two PCA components resembles the first two tsNMF patterns, their contributions to the observed MVBS echogram are much more difficult to interpret due to the intermingled positive and negative entries in both the patterns and the activation sequences (Fig. S5^{1}). PCA could, however, be useful for analyzing anomaly with respect to a mean pattern, as has been shown in Parra *et al.* (2019).

### D. Rank and parameter selection

The selection of hyperparameters in the decomposition, such as the rank and the smoothness parameter, is a topic that requires attention in future application of similar methods. In this work, we use intrinsic properties of the data to select these hyperparameters: we select the minimum rank that yields distinctive daily echogram patterns with moderate reconstruction error and choose a smoothness parameter that resides at the corner of the L-curve, which balances the relative contribution of reconstruction error and smoothness regularization cost in the decomposition optimization (Sec. II B 5). However, the choice of these parameters may also depend on the exact context of analysis. For example, the choice of rank, or the number of echogram patterns, may be driven by strong prior expectation on the number of organism types in the observed area, if such information is available. When echosounder data are analyzed in conjunction with other oceanographic measurements over a large time period, the choice of the rank and the smoothness parameter may also be influenced by how well the decomposition captures echogram structures that temporally co-vary with other ocean variables.

### E. Future development and implications

Our results further help pinpoint specific areas of improvement for matrix decomposition analysis of long-term echosounder time series. First, while tsNMF amends the traditional NMF model by imposing a constraint on temporal smoothness, incorporating spatial smoothness regularization (over depth and time within a day) would likely improve the biological coherence within each pattern. Second, many high-level echogram patterns are transient in time and therefore would be more appropriately captured by a temporally adaptive formulation that allows gradual intrinsic changes *within* the daily echogram patterns (in addition to the time-varying activation strengths *across* day). For example, while pattern #2 captures the strong sub-surface layer in the data, the gradual sinking and dissipation of this layer from August 29 to September 10 is only partially captured as a decrease of activation of this fixed pattern. Similar gradual changes are expected for the timing and depth of DVM across seasons for longer time series. In fact, allowing temporally adaptive factors is a challenging problem across diverse domains that many dynamic decomposition formulations aim to address (Aravkin *et al.*, 2016; Chen *et al.*, 2015; Mohammadiha *et al.*, 2015; Saha and Sindhwani, 2012). Third, data from ocean observing platforms are frequently subject to gaps in the time series due to instrumentation or communication problems. While short data gaps can likely be handled by excluding the missing entries in tsNMF and interpolating the resulting activation sequences, a full methodological development to model the temporal evolution of pattern activations will be needed to adequately address this issue. In addition, the reconstruction error in tsNMF is evaluated through the Frobenius norm, which corresponds to an assumed Gaussian noise model. A different model is likely needed to tackle the challenge of linear decomposition of data with a large dynamic range (Liutkus *et al.*, 2015). This has been an outstanding problem in music and audio analysis, for which data with large dynamic ranges are similarly common. In this study we perform PCP and tsNMF analyses on log-transformed MVBS data based on the common practice in echo processing (e.g., De Robertis *et al.*, 2010). However, a linear decomposition of echo data in the linear domain could open the door for direct physics-based inversion of the decomposition results for important biological quantities, such as biomass, provided that accurate echosounder calibration information is available.

In this paper, we show that unsupervised matrix decomposition methods are promising techniques that can automatically extract ecologically relevant information and provide synoptic insights from long echosounder time series given minimal prior information. This work forms the basis for future development of effective methodologies that integrate echosounder observation into large-scale, autonomous ocean observation strategies.

## ACKNOWLEDGMENTS

We thank Aleksandr Aravkin and Julie Keister at the University of Washington and Dezhang Chu at the NOAA Northwest Fisheries Science Center for their helpful suggestions on this work. W.-J.L. and V.S. were supported by NSF Award No. 1849930. V.S. was also supported by the Gordon and Betty Moore and Alfred P. Sloan foundations Data Science Environments (MSDSE). All data used in this work are openly available on the OOI Raw Data Archive: https://rawdata.oceanobservatories.org/files/. The ADCP data can additionally be accessed programmatically through the OOI machine to machine interface: https://oceanobservatories.org/ooi-m2m-interface/. All analyses performed in this study can be accessed in the GitHub repository: https://github.com/leewujung/ooi-echo-matrix-decomposition.

### APPENDIX

##### 1. Variation of reconstruction error with increasing rank

The variation of the MSE of NMF reconstruction was used to estimate the number of components in a number of previous studies (e.g., Frigyesi and Höglund, 2008; Hutchins *et al.*, 2008; Kim and Tidor, 2003). The method involves comparing the decrease of MSE across increasing rank between the data and a randomized permutation of the data [Fig. 6(A)], and selects the rank at which the MSE decrease starts to level or when the slope of MSE reduction is smaller than that of the permuted data. The rationale is that when the number of components (the specified rank) exceeds the true rank of the data, the extra components would not capture meaningful structures in the data, and hence the MSE reduction would either become less significant or approach the approximately linear slope of a randomly permuted data set, in which the original structure in the data has been disrupted. In this study, we permute the low-rank MVBS data matrix across the *N _{day}* observations (across columns in Fig. 3) so that the variance within each pixel (a specific hour-depth combination) of a daily echogram is preserved but its associations with other pixels are disrupted.

##### 2. Cophenetic correlation

Cophenetic correlation was originally introduced in the context of hierarchical clustering for quantitative comparison of dendrograms (Sokal and Rohlf, 1962) by measuring how truthfully a dendrogram preserves the pairwise distance between individual observations (samples), and has since become a popular method for determining the number of clusters in clustering problems. It was later used as a method to determine the number of components in NMF based on the stability of multiple randomly initiated NMF runs (Brunet *et al.*, 2004). Specifically, the decomposition can be interpreted as a clustering operation, with each component representing a cluster and each observation assigned to the cluster for which the corresponding activation is highest. While the exact order of components in NMF may not be consistent over multiple runs, the more stable the decomposition, the more likely a pair of similar observations would be assigned to the same cluster. The cophenetic correlation coefficient is a measure of the dispersion of the consensus of the cluster assignment across runs and can be calculated for a set of ranks to select the number of clusters (the rank) [Fig. 6(B)]. Brunet *et al.* (2004) suggested selecting the rank at which the cophenetic correlation coefficient begins to fall.

The cophenetic correlation coefficient is computed in the following way: for each NMF run, a connectivity matrix **C** is an indicator matrix, within which each entry indicates whether a pair of observations belong to the same cluster,

The consensus matrix $C\xafij$ for an ensemble of NMF runs is the fraction of the times a pair of observations belong to the same cluster and is equal to the average of the connectivity matrix over all runs. Following the suggestion by Frigyesi and Höglund (2008), we use a weighted version of the consensus matrix,

where *n* is the run number and $wn=((max(e)\u2212e(n))/(max(e)\u2212min(e))$, in which *e*(*n*) is the MSE of the *n*th run. This weighting emphasizes contributions from runs with smaller MSE. The closer $C\xafij$ is to 1, the higher the tendency the *i*th and the *j*th observations are clustered together. The consensus matrix can then be converted to a dissimilarity matrix $Dij=1\u2212C\xafij$. The cophenetic distance between two observations is equal to the height of the link between them from a hierarchical clustering based on the dissimilarity matrix. The cophenetic correlation coefficient is obtained by computing the Pearson correlation between the original dissimilarity of the observations and the corresponding cophentic distance. The cophenetic correlation coefficient is 1 when the cluster assignments are identical for all runs.

##### 3. The L-curve

The tsNMF formulation used in this work is a form of regularized decomposition for which the regularization parameter can be tuned to fit the needs of the application. Specifically, the value of the smoothness parameter *η* controls the balance between the reconstruction error and the smoothness regularization cost in the decomposition [the first and the second term in Eq. (3), respectively]. As *η* increases, the decomposition transitions from a regime dominated by the reconstruction error to a regime dominated by the smoothness regularization cost, forming an “L-curve” that has been widely recognized in the literature (e.g., Mirzal, 2014; Oraintara *et al.*, 2000). In this paper, we follow the recommendation from these previous studies and select *η* = 500 000, which corresponds to the “corner” of the L-curve resulting from rank = 3 tsNMF runs over a broad range of parameter values (Fig. 7). The effects of adjusting *η* on the decomposition results are shown in Fig. S6.^{1}

^{1}

See supplementary material at https://www.scitation.org/doi/suppl/10.1121/10.0002670 for supplementary figures and captions.