In sonar array processing, a challenging problem is the estimation of the data covariance matrix in the presence of moving targets in the water column, since the time interval of data local stationarity is limited. This work describes an eigenvector-based method for proper data segmentation into intervals that exhibit local stationarity, providing data-driven higher bounds for the number of snapshots available for computation of time-varying sample covariance matrices. Application of the test is illustrated with simulated data in a horizontal array for the detection of a quiet source in the presence of a loud interferer.

## I. Introduction

Approximation of the true population covariance matrix $R$ corresponding to a time-dependent process by means of the sample covariance $R\u0302$ can be a challenging problem in highly dynamic environments with large-aperture arrays.^{1} On one hand, a large number of data snapshots *M* is required for the sample covariance to be a consistent estimator. On the other hand, snapshots must be obtained from a limited time interval where the data is assumed to be locally stationary. For this reason, tests for stationarity that require very few samples are relevant. In this work, a statistical test for data stationarity based on the relation between consecutive sample eigenvectors is considered. The proposed test is sensitive to non-stationarity caused by time-dependent variations in source position.

Figure 1(a) shows the simulated scenario considered in this work, consisting of an *N*-element horizontal line array (HLA) in the presence of uncorrelated acoustic sources located at time-varying azimuths in the water column. This configuration results in a population covariance with spiked^{2} structure, so called due to a few eigenvalues associated to sources in the water column resulting in peaks in the eigenspectra. Data segmentation according to local stationarity is illustrated in Fig. 1(b), in which the continuous time axis is discretized as *t*_{0}, *t*_{1}, etc. In this example, from $ta$ to $ta+2$ and from $tb$ to $tb+3$ the azimuths of sources 1 and 2 are constant or slowly-varying (i.e., stationary). In contrast, between $ta+2$ and $tb$ the azimuth of both sources varies significantly. The goal of this work is to define an eigenvector-based statistical test that requires very few snapshots to identify segments of stationary data, based on changes of direction of the corresponding population eigenvectors, as illustrated in Fig. 1(c) and developed in Sec. III.

Random matrix theory (RMT) has been applied to study the sample eigenvalues and eigenvectors of spiked covariances^{3–5} for $N/M\u2192$ *constant* as $N,M\u2192\u221e$. Research on this field has provided (among other results) methods for correcting sample eigenvalue bias^{4} and estimating the effective number of identifiable sources,^{6} as well as tools for performance analysis of array processing techniques.^{7–9} More recently, research on the high dimension, low sample size (HDLSS) regime^{10} (in which $M\u2192$ *constant* as $N\u2192\u221e$) has described the relationship between sample and population eigenvectors of spiked covariances as a joint probability density function (pdf). The work presented here makes use of expressions from HDLSS to derive (Sec. III) a full statistical description of the expected behavior of consecutive *signal* sample eigenvectors (i.e., those eigenvectors related to dominant sources in the water column) under the hypothesis of stationarity. The obtained expression provides a quantitative metric to decide bounds of local stationarity, as illustrated in Sec. IV with a numerical example.

## II. Signal model

Let $ya=sa+wa$ be an N × 1 data snapshot vector^{11} received at time $ta$ at an *N*-element HLA. Each entry of the noise vector $wa$ is normally distributed with power $\sigma w2$, while $sa=\Sigma q=1Qvqa\xi q$ represents *Q* uncorrelated moving sources with power $\sigma q2$. Source amplitudes are complex-valued stochastic variables $\xi q\u223c\sigma q(N(0,1)+jN(0,1))/2$, where *N*(0,1) indicates a random realization from a normal distribution and $j=\u22121$. For each source, the time-dependent replica vector (assuming sources in the far field) is

where $k=\omega /cw$ is the wavenumber, $\omega $ is the frequency in rad/s, $cw$ is the speed of sound in the water, $\theta qa\u2261\theta q(ta)$ is the azimuth angle of the *q*th source at time $ta$, and *d* is the array inter-element spacing. The population (i.e., clairvoyant) covariance for mutually uncorrelated sources is^{11}

where $Va=[v1av2a\cdots vQa]$ is a matrix with source replicas as columns and $\Psi =diag\u2009(\sigma 12\sigma 22\cdots \sigma Q2)$ is a *Q* × *Q* diagonal matrix of the source powers. The covariance can also be written in terms of *population* eigenvalues $\lambda ia$ and eigenvectors $uia$ as

where $Q\u0302\u2264Q$ is the effective number of sources determined by the actual number of sources located at distinct azimuths. Equation (3) is partitioned into eigenpairs containing signal information versus those associated to the background noise. At time $ta$ an estimate of $Ra$ is obtained by averaging the outer product of $M$ snapshots as

where $\lambda \u0302na$ and $u\u0302na$ are the *sample* eigenvalues and eigenvectors, respectively. The number of snapshots *M* available for estimation of signal statistics is determined by the largest time interval over which $ym$ can be considered locally stationary. Testing whether the underlying population covariance structure has changed from that at time $ta$ to that at time $tb$ can be accomplished by monitoring quantities such as the (non-observable) population eigenvalues and eigenvectors at consecutive times, based on the variability of the (observable) sample eigenvalues and eigenvectors. For the simulated scenario in Fig. 1(a) the population eigenvector configuration is strongly affected by target movement. In Sec. III a statistically justified metric to decide between hypothesis $H0$: $Ra=Rb$ versus $H1$: $Ra\u2260Rb$ based on statistics of the sample eigenvectors is described.

## III. Probability distribution of consecutive sample eigenvectors

Consider the *signal eigenspace* spanned by the eigenvectors $u1a$ and $u2a$ at $ta$ in Fig. 1(c), and how it relates to the eigenspace at a later time: In the case of signal stationarity the population eigenvectors do not change (e.g., $u1a=u1a+2$ and $u2a=u2a+2$ in this example), although the direction of the sample eigenvectors varies from realization to realization. On the other hand if the signal is not stationary due to source movement then the population eigenvectors at $tb$ point to different directions in relation to those at $ta$. Since the true eigenvectors are not observable quantities, the decision of whether or not the signal eigenspace has changed must be made based on the observable (yet inaccurate) sample eigenvectors ($u\u03021a,u\u03022a,u\u03021b$, and $u\u03022b$ in this example). This section investigates the pdf of $|u\u0302naHu\u0302nb|2$ (i.e., the squared cosine of the angle between consecutive sample eigenvectors) under the $H0$ hypothesis. Previous work in RMT (Refs. 4 and 5) has shown that for dominant eigenvalues, the corresponding sample eigenvectors are constrained to lie on a cone centered on the population eigenvectors, as illustrated in Fig. 1(c). Due to this constraint, a relation can be expected between the sample eigenvectors at $ta$ and $tb$ under the condition of data stationarity. To obtain a full probability distribution for $|u\u0302naHu\u0302nb|2$ under hypothesis $H0$, each sample eigenvector is written as the sum of projections parallel $(p\u0302na)$ and transverse $(g\u0302na)$ to the signal subspace:^{10}

Under hypothesis $H0$, $uqa=uqb$ for all *q* so $p\u0302naHg\u0302nb=0$. For simplicity, the case $Q\u0302=1$ (i.e., one dominant interferer) is considered, so

According to RMT, for $\lambda 1a>\sigma w2(1+N/M)$ the transverse component $g\u03021a$ is a vector with direction uniformly distributed on an *N* − 1 dimensional sphere, making $|g\u03021aHg\u03021b|\u21920$. For HDLSS, the asymptotic regimes^{10} for eigenvector convergence depend on the magnitude of $\lambda 1a$ compared to $N$. In this paper we focus on the HDLSS regime for which $\lambda 1a\u223cN$ (i.e., loud interferers if $N$ is large), for which $|g\u03021aHg\u03021b|\u21920$ is still valid since the RMT condition $N>\sigma w2(1+N/M)$ is met for $\sigma w2/N\u226a1$ (as expected for usual background noise levels). Notice that from Eq. (5) with $u1a=u1b,p\u03021aHp\u03021b=((u\u03021aHu1a)u1a)H$$((u\u03021bHu1a)u1a)$. Finally,

where $Xa=|u\u03021aHu1a|2$ and $Xb=|u\u03021bHu1a|2$ are independent random variables and * indicates the complex conjugate operator. It has been shown^{10} that $Xa$ (or $Xb$) is distributed as

where $\chi M2(\u2009\xb7\u2009)$ is the Chi-square distribution with *M* degrees of freedom. Then, the pdf of $z=|p\u03021aHp\u03021b|2$ under hypothesis $H0$ is given by^{12}

The integral in Eq. (9) can be computed numerically for any value of $z$ by discretizing the integration variable $xa$. For $M\u226aN$ snapshots, the test provides a quantitative expression to distinguish between sample eigenvector variability due to true changes on the underlying signal statistics (i.e., non-stationarity) versus variability due to the lack of snapshots. Application of Eq. (9) to data segmentation is illustrated in Sec. IV with an example.

To conclude this section, it is important to comment on the extension of the proposed stationarity metric to the case of multiple interferers. For $Q\u0302>1$ the stationarity metric is a function of the scalar products $|u\u0302naHu\u0302mb|2$ where $m,n=1,\u2026,Q\u0302$. Similar to the procedure described in this section, expressing each sample eigenvector according to Eq. (5) allows defining the metric function in terms of random variables $|unaHu\u0302mb|2$ (i.e., population vs sample eigenvector), the statistics of which are described in the literature^{10} and can be used to obtain the first and second moments of the stationarity metric.

## IV. Numerical example

This section considers the case of a fast maneuvering quiet source (received signal strength of $\sigma 22$ = 5 dB) initially located at $\theta 2=0\xb0$ azimuth in the presence of a slow and loud interferer ($\sigma 12$ = 18 dB) initially at $\theta 1=3.5\xb0$. For this simulation $cw$ = 1500 m/s, *f* = 500 Hz, *N* = 120 sensors, *d* = 1.5 m, and the background noise is set to $\sigma w2$ = 0 dB. Tracking of the source azimuth is accomplished by applying minimum variance distortionless response beamforming to the data covariance matrix. Ideally, if the clairvoyant covariance is known,^{11}

where $v(\theta )$ is the replica vector for a plane wave impinging the array with angle *θ*. The beamformer output corresponding to Eq. (10) is shown in Fig. 2(a) and it provides a benchmark for beamformer simulations in which $Ra$ must be estimated. The results are plotted with time in the vertical axis and azimuth in the horizontal axis, showing the time-varying azimuth for the fast source and the loud interferer. In realistic situations, the covariance matrix must be estimated from limited samples and the beamformer becomes

where $R\u0303a=\Sigma q=1Q\u0302\u2009\lambda \u0303qa\u2009u\u0303qa\u2009u\u0303qaH+\u03f5I$ is a modified sample covariance matrix^{8,13} (note the $\u2009\u0303$ indicator) that retains the dominant eigenvectors. The ϵ loading is used to stabilize the inverse $R\u0303a\u22121$. The results of $BFMa(\theta )$ corresponding to *M* = 5 snapshots per covariance are shown in Fig. 2(b). Due to the lack of snapshot support the power of the interferer is underestimated at all times and the quiet source is not detected at some time instances. Figure 2(c) shows the beamformer output using improved weights, obtained from sample covariance matrices with increased snapshot support determined by data segmentation based on Eq. (9). For example, between 0 and 25 s the interferer azimuth remains constant, so the dominant eigenvalue/eigenvector ($\lambda \u03031a$ and $u\u03031a$, respectively) can be estimated with all the data snapshots for this time interval as opposed to just *M* = 5 snapshots. A similar situation is observed between 50 and 65 s, where the variation in azimuth for the interferer is slow enough to consider the data as stationary within this interval. The importance of Eq. (9) is that it allows objective data segmentation based on the observable sample eigenvectors. This is shown in Fig. 2(d), where the theoretical $PZ(z|H0)$ is shown in the background indicating the probability of random realizations of $|u\u03021aHu\u03021b|2$ under hypothesis *H*_{0}. In this panel, Monte Carlo realizations of $|u\u03021aHu\u03021b|2$ at each time step are over-plotted as a solid white line. Segmentation bounds indicated as horizontal dashed lines are placed whenever $|u\u03021aHu\u03021b|2<0.32$ or $|u\u03021aHu\u03021b|2>0.67$, selected to exclude values of *z* with low probability. Notice that the theoretical $PZ(z|H0)$ was computed assuming knowledge of the (unobservable) ratio $\sigma w2/\lambda 1a$. While estimates of $\lambda 1a$ and $\sigma w2$ based on sample eigenvectors for $N\u226bM$ have been suggested in the literature,^{4} the impact of using such estimates in $PZ(z|H0)$ is subject of future research.

In general, slow variations of the interferer azimuth cause $z=|u\u03021aHu\u03021b|2$ to remain around the value where $PZ(z|H0)$ peaks. Deviations from this peak can be interpreted as statistical evidence to select $H1$ over $H0$, thereby providing a data-driven segmentation approach. The advantage of setting *M* according to the segmentation in Fig. 2(d) can be quantified by comparing the scalar product between the first (and dominant) population eigenvector to the sample eigenvectors $u\u03021a$ (obtained with *M* = 5) and $u\u03031a$ (obtained with *M* ≥ 5, according to stationarity bounds). This is shown in Fig. 2(e), which demonstrates the improvement achieved with the segmentation procedure on estimating the dominant eigenvector: In all cases, $|u1aHu\u03031a|\u2009\u2009>\u2009\u2009|u1aHu\u03021a|$, approaching the ideal value of 1 for time intervals with slow or no interferer motion.

A comparison of Figs. 2(b) and 2(c) highlights the advantage of using the improved weights in the beamformer: For Fig. 2(c), the power of both sources is estimated more accurately compared to Fig. 2(b), and in some cases it also improves the detection of the quiet source, such as at times 23.8 and 104.72 s, for example. This comparison is shown in Fig. 3, in which the improvement in detecting the quiet source is evident.

## V. Summary

An eigenvector-based test for data stationarity, suitable for snapshot-starved experimental scenarios with large arrays in the presence of maneuvering targets, has been described in this work. The test gives an ability to estimate from the data the number of snapshots that should be used to compute a sample covariance matrix where stationarity is ensured. This suggests applications to beamforming techniques with a variable number of snapshots according to the dynamics of dominant sources in the water column. Application of the test for improvement of the detection levels of targets and interferers using Dominant Mode Rejection (DMR) beamforming was illustrated with an example that includes a fast-maneuvering target in the presence of a loud interferer.

## Acknowledgment

The authors gratefully acknowledge the support of the U.S. Office of Naval Research, ONR Undersea Signal Processing.