The authors consider the retrieval of Green functions G from the correlations of non-stationary non-fully diffuse noise incident on an array of sensors. Multiple schemes are proposed for optimizing the time-varying weights with which correlations may be stacked. Using noise records created by direct numerical simulation of waves in a two-dimensional multiply scattering medium, cases are shown in which conventional stacking does a poor job and for which the proposed schemes substantially improve the recovered G, rendering it more causal and/or more symmetric, and more similar to the actual G. It is found that the schemes choose weights such that the effective incident intensity distribution is closer to isotropic.

## I. INTRODUCTION

Diffuse noise fields have attracted immense attention over the last 15 years, driven in large part by the promise of noise correlations to reveal seismic velocities and to monitor changes in the Earth's elastic properties. Key to this were the arguments and ultrasonic demonstrations (Weaver and Lobkis, 2001; Lobkis and Weaver, 2001) that fully equipartitioned diffuse waves permit, on cross-correlating the apparent noise, retrieval of the Green function, where the Green function is the signal one would have at one receiver if the other were replaced with a concentrated impulsive force. Further theoretical arguments (Roux *et al.*, 2005b; Snieder, 2004; Wapenaar, 2004; Weaver and Lobkis, 2004) and laboratory demonstrations (Derode *et al*., 2003; Larose *et al*., 2004; Malcolm *et al.*, 2004) in support of this conclusion quickly appeared.

Applications in long-period seismology have been extensive. Tomographic maps of Rayleigh wave velocity have been constructed (Lin *et al.*, 2008; Sabra *et al.*, 2005; Shapiro *et al.*, 2005). The literature also reports retrieval of waveforms due to Love waves, and bulk waves (Roux *et al.*, 2005a), and even measures of Rayleigh wave anisotropy (Lin *et al.*, 2009). Dense large arrays consisting of up to hundreds of seismic stations permit high resolution; the world-wide availability of their daily records for periods of years lends itself naturally to the necessary kind of data processing.

The original theoretical basis for the relation is restrictive; ambient seismic noise fields are rarely fully diffuse (Mulargia, 2012). This has long been recognized. Long-period seismic noise fields are directional, with greater intensity in some directions than others. When that directionality is smooth and when there are no scatterers, asymptotic arguments (Snieder, 2004; Weaver *et al.,* 2009; Godin, 2009) have shown why the non-isotropy has proved unproblematic. As Snieder *et al.* (2008), Snieder and Fleury (2010), and Yoritomo and Weaver (2016) have emphasized, however, non-fully-diffuse ambient noise, even when smoothly non-directional, generates spurious features in the cross correlations when scatterers are present. These features do not diminish with greater integration time and do not form part of the Green function; they thereby obscure desired information, in particular, distorting the main arrival. This may be part of the reason that efforts to retrieve amplitude information and attenuation have proved difficult (Zhang and Yang, 2013).

Green function retrieval has been shown to be improved by several more or less complicated signal processing methods for increasing the convergence rate. Bensen *et al.* (2007) discuss many of the approaches. Here, we propose what we believe are new methods, designed to take advantage of noise fields that, while nonisotropic, have their non-isotropy varying in time. By combining cross correlations from different blocks of time, it is found that the net anisotropy is reduced and retrieval enhanced; signal to noise ratios are improved.

There are other recent proposals for enhancement with which our approach may be contrasted. Curtis and Halliday (2010) sought “directional balancing” for the correlations in a non-isotropic but fixed incident field. They use a dense array of sensors and assuming no spurious features, minimize the differences between a subset of the correlations and the actual measured G by fitting a direction dependent incident intensity. This simultaneously finds the unknown incident directionality and provides information with which to correct all the other correlations. Chaput *et al.* (2016) chose to correlate optimal coda windows from each of many multiple events and stacked them to construct empirical Green functions G. Seydoux *et al.* (2017) and Menon *et al.* (2012) have proposed a kind of spatial filtering by reweighting the spectral decomposition of the N × N matrix of correlations (i.e., adjusting the eigenvalues but not the eigenvectors) over an array of N sensors and thereby manage to de-emphasize contributions from strong point sources. Closely related is the work of Melo *et al.* (2013) who do an SVD of the correlations for each of a set of point sources of noise. They show, for a nonscattering medium, that the leading singular vectors can do a superior job at representing G, as this selects a subset of sources over which the correlations are most coherent. As such it resembles the phase-coherence-weighted approach of Schimmel *et al.* (2011). Wapenaar *et al.* (2011) review multi-dimensional deconvolution, an approach that, like that of Curtis and Halliday (2010), attempts to correct for imperfect illumination.

In the following it is useful to draw a distinction between two of the sources of difference between correlations and Green functions. There is finite-T noise from averaging over finite times, and which vanishes if one has access to unlimited noise records. This contribution is quantified in the Appendix. There is also an apparent noise, a contribution to the discrepancy between G and correlation, which does not diminish with integration time T. This contribution includes the spurious features discussed above; it takes its origin from failure of the conditions for the theorem, i.e., from having incident noise fields that are not fully-diffuse. Empirically one observes differences between G and correlations, differences that may look like noise. It is not always obvious what their source is.

Section II describes a few theoretical preliminaries. Section III proposes several schemes by which the correlations obtained in different blocks of time may be weighted and summed into a stack. Each scheme optimizes a different measure of stack quality. Section IV describes numerical experiments used to test the schemes, and Sec. V discusses the results of those experiments. We end in Sec. VI with a short discussion and conclusion.

## II. THEORETICAL PRELIMINARIES

We posit an array of sensors *i = *1,2,…, each with a time-record *ψ*_{i}(t) from each of several blocks of time *d = *1, 2,…, *D* that we call “days” though their length T_{d} is arbitrary. For simplicity here we take each block to have equal length T_{d}* = *T,

Bensen *et al.* (2007) suggest various signal processings and filterings to apply at this point. Chief amongst them is a temporal regularization in which each record ψ_{i} is normalized by a local-in-time-and-space root mean square; the normalization may thus differ amongst the several sensors. (A severe version of this is to replace each record by its sign: called one-bit processing.) Here, we eschew such local-in-space normalizations as they rescale each sensor differently, making it difficult to interpret relative amplitudes in the correlation waveforms (although they seem to not distort arrival times.) It has been argued (Weaver, 2011, 2013; Zhang and Yang, 2013; Bowden *et al.*, 2015, 2017) that a re-scaling that applies equally to all sensors in an array is required for accurate retrieval of amplitudes. Band pass filtering or whitening can be applied at this stage, or after the correlations are constructed.

The next step is to construct the empirical correlations from each day

the integral being over the day, of length T, that is far longer than any lapse time τ scales of interest. A stack is typically constructed by simply summing all the *c*'s

It is this quantity that may correspond to the band-passed, anti-differentiated (i.e., indefinitely integrated) and scaled Green function. As discussed above, the correspondence is in practice often imperfect. What is perhaps occasionally neglected is a recognition that the sum may alternatively be constructed with arbitrary weights that are constant across the sensor array,

Thus the question arises as to what set of weights is optimal. Weaver (2011, 2013) has argued (also see the Appendix) that a choice in which each weight is inversely proportional to the day's total energy on the array may be best, as it minimizes the finite-T noise from incomplete time-averaging. Melo *et al.* (2013), though they do not use this language, implicitly suggest a set of weights that emphasize coherence amongst the several correlations c that constitute the stack. Here, we propose several other schemes by which the weights may be chosen, each designed to optimize a different measure χ of how satisfactory the resulting X will be.

In the following analysis we find it convenient to normalize the daily correlations to unit array energy,

*E _{d}* is the sum over all the array's sensors of their integrated square noise on that day. It is thus representative of the energy incident upon the array that day. The amplitude of the day's correlations $cijd$(τ) scales with the day's incident noise intensity

*E*. Bowden

_{d}*et al.*(2017) define their

*E*slightly differently. The

_{d}*C*may be termed normalized correlations. We then define our weighted stack as

We will seek weights λ that minimize the noise in X. (In practice this may need to be done separately for each frequency range of interest. Here we examine only one such.) To do so we need to define a few measures of noise, and two measures of the signal strength in X with which to compare it.

We define two measures of X's strength

where

$E$_{X} is a measure of the total energy in the array X; it scales with the square of the weights λ. A_{X} is a measure of signal amplitude in the stack; it scales linearly with the weights and its *square*$AX2$ gives an alternate measure of the energy in the stack. If i = j is excluded from the sum for $E$_{X} then, unlike *A _{X}*, it will not be contaminated by non-seismic and local sensor noise (if we assume such noise on different sensors is not correlated). $\lambda \xaf$ is a column vector with entries

*λ*

_{d}. 1 is a column vector of 1's.

We also define three measures of noise in X. One such is the noise due to having averaged over only finite times T in Eq. (1), leaving residual fluctuations with their origin in the randomness of the incident fields. The Appendix quantifies this noise and proposes a global measure over the sensor array as the variance of X_{ij} summed over all the pairs ij, Eq. (A14), and shows that it is proportional to the sum of the *λ*_{d}^{2} (and as is well known inversely proportional to integration time T),

We propose two further measures of discrepancy between X and G. The first is based on knowing that the perfect X is symmetric. Thus, we define an antisymmetry energy, or antisymmetry noise,

the integral being over a regime of interest in τ that includes the main arrival and coda if any. (In the following, we take the integral to cover all times from zero to long after the main arrival.) $S$ is quadratic in the λ:

Acausality in X is an alternate sign of discrepancy between G and X. Thus, we define acausality energy (or acausality noise) as an integral of X^{2} over precausal times

The integral is over times between estimates of the anti-causal arrival and the causal arrival. This interval is different for different sensor pairs. This is a measure that, when minimized, will tend to diminish spurious features due to scattering and poorly partitioned noise fields. $C$ is quadratic in the λ,

## III. THE SCHEMES

We now propose eight distinct ways to choose the weights *λ*. Except for the first, each can be construed as minimizing one of the above measures of noise, with different choices for the definition of noise and different constraints on that minimization. Scheme I is the choice *λ*_{d} = E_{d}. This simply undoes the normalization. The resulting X(*τ*) is the conventional stack [Eq. (2)] without normalization and with no optimization.

Scheme II, called “temporal flattening,” is the choice λ_{d} = 1, corresponding to the suggestion of Weaver (2011, 2013). This scheme for choosing the λ may be understood as minimizing a quantity *χ*_{II} proportional to global noise to signal ratio, if by “noise” is meant the variance due to finite T and if by “signal” is meant the square of A_{X}. See Eq. (A15),

This is minimized by equal *λ*_{d}. Any deviation from equality amongst the *λ*_{d} will lead to a higher value of χ_{II} and greater finite-T-noise to signal ratio in X(*τ*).

Scheme III: Choosing a vector λ that minimizes antisymmetry noise $S$ requires a constraint of some kind. (Otherwise $S$ is minimized by the trivial solution $\lambda \xaf$ = 0.) The most obvious constraint is perhaps the familiar normalization condition $\lambda \xaf$^{T}$\lambda \xaf$ = 1, being the sum of the squares of the *λ*_{d}. This leads to a choice of weights *λ*_{d} proportional to the components of the first eigenvector of *M ^{S}*. It is not hard to see that this way of choosing the

*λ*is equivalent to minimizing a quantity

that is proportional to a ratio of antisymmetry noise $S$ to finite-T-noise. As such it is a noise-to-noise ratio. Minimizing such may be of limited value. Nevertheless, we consider it because of its simplicity.

Scheme IV seeks to minimize acausality while imposing the constraint $\lambda \xaf$^{T}$\lambda \xaf$ = 1. The minimal $C$ is then found for a $\lambda \xaf$ that is proportional to the eigenvector corresponding to the lowest eigenvalue of *M ^{C}*. Scheme IV is equivalent to minimizing the quantity

and is interpretable as a ratio of acausality noise to finite-T noise. As with scheme III, we speculate that a noise-to-noise ratio may be of limited value.

Scheme V: We again seek to minimize $S$ as defined above, but now impose the constraint of a fixed A_{X} = Σ_{i}X_{ii}(0) = Σ_{d}$\lambda $_{d}, i.e., that the energy $AX2$ in the stack X is specified. The $\lambda \xaf$ that achieves this minimization is proportional to the solution of the matrix equation

This scheme is equivalent to minimizing a quantity

that is interpretable as the ratio of the energy of antisymmetry to signal energy $AX2$. It is, unlike χ_{III} and χ_{IV}, a noise-to-signal ratio. As such we speculate that minimizing it will be more useful than minimizing χ_{III} or χ_{IV}.

Scheme VI: We minimize acausal energy under the constraint of fixed $AX2$ and so seek a $\lambda \xaf$ that solves

The scheme is equivalent to minimizing

As with χ_{V} it is a noise-to-signal ratio, being interpretable as the ratio of the acausality energy to signal energy $AX2$.

Scheme VII: Again, we minimize the antisymmetry noise $S$, and as in scheme V use a constraint of specified energy in the waveforms of X. But here we take the alternative definition for that energy, $E$_{X}. This leads to the generalized eigenvalue problem

for which we choose the eigenvector with the lowest eigenvalue Λ. This is equivalent to minimizing

which is interpretable as a ratio of antisymmetry noise to signal energy as measured by $E$_{X}.

Scheme VIII: We minimize acausality using as constraint the energy measure $E$_{X}. This leads to choosing the eigenvector with lowest eigenvalue Λ of the generalized eigenvalue problem,

It is equivalent to minimizing

interpretable as the ratio of acausality energy to signal energy as measured by $E$_{X}.

Each of the above schemes II–VIII chooses a set of weights λ_{d} by minimizing its measure of noise in X, and constructs the corresponding weighted correlation X(*τ*), Eq. (4). Schemes II and V–VIII minimize various measures of noise-to-signal ratio. Schemes III and IV minimize ratios of different contributions to noise.

Taking each day to have a distribution of incident noise p_{d}(θ) (here called ponderosity for lack of a concise and unique alternative), the resulting X may be identified with the correlation one would have had in a single day with ponderosity P(θ) = Σ_{d}(*λ*_{d}/E_{d})p_{d}(θ). We will find that the optimization schemes III–VIII that seek symmetry or causality choose weights *λ* corresponding to smaller relative variances in P [defined as P relvar = 2*π*∫P^{2}(*θ*)d*θ*/(∫P(*θ*)d*θ*)^{2} − 1]. Thus, they generate better approximations to fully diffuse incident isotropic noise than do a conventional stack (*λ*_{d} = E_{d}) or flattened stack (*λ*_{d} = 1).

## IV. NUMERICAL EXPERIMENTS, DESIGN

Numerical simulations to test the proposed schemes are carried out on a discrete 271 × 271 mesh representing a square two-dimensional domain. The wave equation is solved using central differences

where the vector index $r\u2192$ indicates a site of the 271 × 271 mesh ($r\u2192$ may be represented by an ordered pair of integers between 0 and 272), *f* is a time- and space-dependent forcing, and σ represents a spatially varying anelasticity. *s _{r}* represents a stiffness at site $r\u2192$. The sum is over the four nearest neighbor sites $r\u2192\u2032$ to $r\u2192$. Dirichlet boundary conditions are imposed: points identified by their having either index being 0 or 272 are held at

*ψ*= 0.

In the absence of forcing, and anelasticity σ, and with the approximation that *s* may be replaced by its spatial average ⟨*s*⟩ [Ziman (1979) calls this a virtual crystal approximation^{1}], the difference equations (22) admit plane wave solutions

with anisotropic dispersion relation

For small *k*, i.e., long wavelengths and low frequencies, this is approximately *ω*^{2 }= ⟨s⟩ + *k _{x}*

^{2}

*+ k*

_{y}^{2}

*,*corresponding to an isotropic medium with (at small ⟨

*s*⟩) unit wave speed. If ⟨

*s*⟩ is small, dispersion is slight.

The domain is illustrated in Fig. 1 where shading at the periphery indicates a graded distribution of anelasticity. The interior is without dissipation. Dark squares indicate the positions of nine sensors at which the wave ψ is recorded. Lighter dots indicate the positions of concentrated stiffnesses *s* that act as scatterers. The broad-band delta-correlated Gaussian noise sources *f* are chosen to have mean square strengths proportional to the local dissipation σ_{r}, and so are confined to the periphery; ⟨$fr2$⟩ = σ_{r} p(*θ*). Were p(*θ*) to be a constant such that source strength is exactly proportional to damping, basic theorems (Weaver, 2008; Snieder, 2007) tell us that correlations will exactly retrieve the Green function, contaminated only by a residual noise that diminishes with integration time T. In order to explore the effects of imperfectly diffuse incident noise, we choose ⟨$fr2$⟩ to be proportional, not just to σ_{r} but also to specified functions *p* of direction *θ* from the center that differ from day to day. Furthermore, because scattering is not too strong, the specified functions of θ approximately represent incident noise field ponderosity at the center of the array. Dissipation is chosen to have a magnitude sufficient to assure that little energy is reflected from the boundaries.

In all the cases to be presented noise was recorded on each of nine receivers, cross correlated and filtered over a pass band centered on 0.1 Hz corresponding to a wavelength of about 10 lattice spacings. Scatterers were randomly distributed with a number density *ρ* = 0.015 and a strength of *s* = 3. All other sites had zero stiffness s. Thus, the spatial average stiffness ⟨*s*⟩ was 0.045, and indeed small compared to *ω*^{2 }= 0.394, thus permitting the approximation discussed following Eq. (24). Theory (see, for example, Ziman, 1979) estimates that these isotropic point scatterers generate a mean field attenuation of about 0.0053 neper per lattice spacing.^{1}

In order to compare with seismic scales, it is convenient to identify one time unit of the simulation (3.33 time steps *dt*) with one second. Assuming a seismic speed of ∼3 km/s, one identifies the mesh spacings as 3 km, such that the studied domain has dimensions of 820 × 820 km. Noise field simulations are typically run for 2^{24} time steps of *dt* = 0.3 s each, thus a scaled duration of two months.

We confirm the accuracy of the code and our understanding by examining Figs. 2 and 3. These show the result of a run (case O) with a single “day” of two months duration with fully diffuse incident intensity due to an isotropic source distribution, p(*θ*) = 1. Figure 2 displays a staggered (called a “waveform section” or “record section” in the seismology literature) set of retrieved correlations c_{ij}(*τ*), each offset vertically by an amount proportional to the distance between sensors i and j. Figure 3 shows c_{26} and the deterministic response G_{26}, at sensor 6, due to a unit concentrated initial displacement at sensor 2, as filtered by the same bandpass filter, anti-differentiated and scaled. This represents the effective Green function between sensors 2 and 6. The correspondence with c_{26} is excellent, in accord with theory that random sources with strength proportional to dissipation generate fully diffuse fields. An estimate [see Eq. (A7)] for the root-mean-square (rms) of the residual noise in *c* (due to having integrated for only two months) is also indicated. The figure suggests that the noise prediction is a slight overestimate, but detailed analysis (not discussed here) shows that it correctly predicts the observed rms difference between G and the correlation.

## V. NUMERICAL EXPERIMENTS, RESULTS

Having established that the code generates correlations c(τ) in agreement with deterministic responses G when the incident field is fully diffuse, and whose residual finite-T-noise is in accord with predictions from the expression in the Appendix, we now turn to a scrutiny of the proposed schemes I-VIII for treating non-stationary and non-isotropic noise. We present a number of examples, with different choices for daily ponderosity {p_{d}(θ)}, and examine the weights λ chosen by each of the schemes and the quality of the resulting stacks. The matrix equations and leading eigenvalue problems derived in Sec. III were solved using codes from Numerical Recipes (Press *et al.,* 1992).

In case A we let there be two days (each of equal length, 2^{23} time steps, i.e., about 4 weeks), and choose the corresponding p_{1}(θ)and p_{2}(θ) to be such that neither is isotropic, but such that there is a linear combination which is isotropic. Figure 4 displays those two ponderosities.

Table I summarizes the schemes' results for case A. The daily weights *λ* are given in the second column (and normalized to sum to the number of days, i.e., 2). The last column is the relative variance of the implied net ponderosity P(*θ*),

P's relative variance is a measure of its anisotropy. A small value for the relative variance indicates a constant P. It is apparent that all six optimization schemes III–VIII that consider symmetry and causality are equally able to identify the unique optimal set of weightings, and that that weighting does indeed correspond to isotropy. The third through fifth columns of the table give the measures χ that the various schemes seek to minimize. The third column gives χ evaluated for weights λ_{d} = E_{d} corresponding to the conventional stack of unnormalized correlations; column four gives χ for the weights λ_{d} = 1 corresponding to “flattening.” Column 5 gives the value obtained by minimization, i.e., using the weights given in column two. We see that the figures of merit χ of all the schemes are improved—most of them by factors of order 100—by the minimization procedures. Flattening (scheme II) has improved χ_{II}, a measure proportional to the ratio of finite-T-noise to signal $AX2$, by a small amount.

Scheme $S$ . | Weights $\lambda \xafS$ . | $\chi S$ ($\lambda \xaf$_{I})
. | $\chi S$ ($\lambda \xaf$_{II})
. | $\chi S$ ($\lambda \xafS$) . | P relvar . | |
---|---|---|---|---|---|---|

I | 1.603 | 0.397 | 1.17 | |||

II | 1.000 | 1.000 | 0.68 | 0.50 | 0.50 | 0.17 |

III | 0.578 | 1.422 | 0.257 | 0.060 | 0.0013 | 0.0000086 |

IV | 0.575 | 1.425 | 0.029 | 0.0055 | 0.00024 | 0.0000016 |

V | 0.580 | 1.420 | 0.175 | 0.030 | 0.00076 | 0.000021 |

VI | 0.579 | 1.421 | 0.0156 | 0.00276 | 0.00014 | 0.000017 |

VII | 0.580 | 1.420 | 0.597 | 0.150 | 0.00426 | 0.000028 |

VIII | 0.579 | 1.421 | 0.053 | 0.0137 | 0.00079 | 0.000015 |

Scheme $S$ . | Weights $\lambda \xafS$ . | $\chi S$ ($\lambda \xaf$_{I})
. | $\chi S$ ($\lambda \xaf$_{II})
. | $\chi S$ ($\lambda \xafS$) . | P relvar . | |
---|---|---|---|---|---|---|

I | 1.603 | 0.397 | 1.17 | |||

II | 1.000 | 1.000 | 0.68 | 0.50 | 0.50 | 0.17 |

III | 0.578 | 1.422 | 0.257 | 0.060 | 0.0013 | 0.0000086 |

IV | 0.575 | 1.425 | 0.029 | 0.0055 | 0.00024 | 0.0000016 |

V | 0.580 | 1.420 | 0.175 | 0.030 | 0.00076 | 0.000021 |

VI | 0.579 | 1.421 | 0.0156 | 0.00276 | 0.00014 | 0.000017 |

VII | 0.580 | 1.420 | 0.597 | 0.150 | 0.00426 | 0.000028 |

VIII | 0.579 | 1.421 | 0.053 | 0.0137 | 0.00079 | 0.000015 |

Figures 5–7 display the correlations X_{26} between sensors 2 and 6 as obtained by three of the schemes in case A. The recovered X(t) from scheme VI [all optimizing schemes III–VIII have about the same weights and thus have X(τ) that look about same] is compared with that from schemes I (conventional stack) and II (flattened). The horizontal lines indicate the estimated root mean square fluctuations {[varX_{ij}]^{1/2}; see Eq. (A12)} due to finite averaging time T. Attention is drawn to how, in schemes I and II that have effective ponderosities P that are not isotropic, the random deviations are greater than predicted by the finite T calculation, and which therefore would not improve with greater T. While in the optimizing schemes whose P are isotropic the random deviations are well predicted by the finite T expression. Further improvement would be possible with more integration time.

In case B we let there be four days (each of equal length, 2^{22} time steps, i.e., about 2 weeks), and choose the p_{d}(θ) such that none are isotropic, and for which there is no linear combination that *is* isotropic. The ponderosities each day for case B are displayed in Fig. 8.

Table II summarizes the schemes' results for case B. The weights chosen by the optimizations are all very similar; the relative variance of their implied ponderosities P are all roughly equal, and none of them much higher than the best possible (0.1074 based on weights 1.29, 0.996, 1.83, −0.12). The optimizing schemes III–VIII all see improvements in their measures χ by a factor of 6 to 10. This is less than the factor ∼100 seen in case A, but still an improvement. It should not be a surprise that perfect improvement was not achieved, as there is *no* linear combination P of daily ponderosities that is isotropic.

Scheme $S$ . | Weights $\lambda \xafS$ . | $\chi S$ ($\lambda \xaf$_{I})
. | $\chi S$ ($\lambda \xaf$_{II})
. | $\chi S$ ($\lambda \xafS$) . | P relvar . |
---|---|---|---|---|---|

I | 2.56, 0.54, 0.41, 0.48 | 0.85 | |||

II | 1.0, 1.0, 1.0, 1.0 | 0.45 | 0.25 | 0.25 | 0.26 |

III | 1.30, 0.72, 2.42, −0.44 | 0.27 | 0.14 | 0.026 | 0.133 |

IV | 1.45, 0.93, 2.08, −0.46 | 0.025 | 0.014 | 0.0035 | 0.119 |

V | 1.27, 1.04, 1.86, −0.17 | 0.122 | 0.035 | 0.0115 | 0.109 |

VI | 1.28, 1.18, 1.62, −0.075 | 0.014 | 0.0035 | 0.0014 | 0.111 |

VII | 1.27, 1.04, 1.86, −0.18 | 0.46 | 0.17 | 0.061 | 0.108 |

VIII | 1.28, 1.19, 1.59, −0.06 | 0.043 | 0.017 | 0.0074 | 0.112 |

Scheme $S$ . | Weights $\lambda \xafS$ . | $\chi S$ ($\lambda \xaf$_{I})
. | $\chi S$ ($\lambda \xaf$_{II})
. | $\chi S$ ($\lambda \xafS$) . | P relvar . |
---|---|---|---|---|---|

I | 2.56, 0.54, 0.41, 0.48 | 0.85 | |||

II | 1.0, 1.0, 1.0, 1.0 | 0.45 | 0.25 | 0.25 | 0.26 |

III | 1.30, 0.72, 2.42, −0.44 | 0.27 | 0.14 | 0.026 | 0.133 |

IV | 1.45, 0.93, 2.08, −0.46 | 0.025 | 0.014 | 0.0035 | 0.119 |

V | 1.27, 1.04, 1.86, −0.17 | 0.122 | 0.035 | 0.0115 | 0.109 |

VI | 1.28, 1.18, 1.62, −0.075 | 0.014 | 0.0035 | 0.0014 | 0.111 |

VII | 1.27, 1.04, 1.86, −0.18 | 0.46 | 0.17 | 0.061 | 0.108 |

VIII | 1.28, 1.19, 1.59, −0.06 | 0.043 | 0.017 | 0.0074 | 0.112 |

Figure 9 displays the several effective ponderosities P(θ) [Eq. (25)] as implied by the various schemes' choices for the *λ*. We see that (except for the black and red lines corresponding to conventional and flattened) they are all roughly the same. The six optimization schemes III–VIII all have about the same ponderosity P, with all of them being more isotropic than those of I and II, but still imperfect.

Figure 10 displays the staggered correlation waveforms from schemes I and VII, VII's being substantially cleaner looking than I's.

Figures 11–13 show the correlation waveforms between sensors 2 and 6 in more detail where again we can see the improvement when an optimized weighting is used. The improvement is not as good as it was in case A, because no weighting would have been perfect. Even in scheme VIII (Fig. 13), we see that fluctuations at precausal times exceed those associated with finite T; the correlation remains contaminated by spurious features.

Figures 14 shows a comparison of G_{26}, and X_{26} as obtained from scheme VII in which it is apparent that the correspondence is better at times after the main arrival than before. Indeed, the differences at the latest time [see Fig. 14(b)] seem to be on the order of the expected residual fluctuations (with rms 0.0005) from finite averaging time T, while the differences at precausal times are larger than that, in accord with expectations of spurious arrivals at short times (Snieder *et al.,* 2008; Yoritomo and Weaver, 2016) when the ponderosity is not uniform.

Case C consists of four days with ponderosities p(*θ*) as illustrated in Fig. 15. They are such that there is an isotropic linear combination; however, that linear combination requires significant negative weights. Days 1 and 2 are each dominated by intensity from the east; we will find that day 1's contribution to X is best mostly subtracted from day 2's in order to deemphasize the dominant direction.

Table III summarizes the scheme weights (normalized to sum to 4) and *χ*'s for case C. The results of the optimizing schemes fall into two classes. Schemes III and IV both achieve very small relative variance in their P, and both achieve significant improvements in their respective figures of merit *χ*. Schemes V–VIII choose weights that generate larger (though still smaller than I and II) relative variances; they too are attempting isotropic P. It may be particularly noted that III and IV call for very large weights with positive and negative signs, while the others do not. Interestingly, only scheme IV—and to a lesser extent III—finds weights close to those that would produce perfect isotropy (−13.85, 17.85, 0, 0). From the table one might suppose that schemes III and IV were the best, as they produce the best improvements in their *χ* and the best relative variances in P. However, examination of Figs. 16 and Figs. 18–21 shows that this is not the whole story.

Scheme $S$ . | Weights $\lambda \xafS$ . | $\chi S$ ($\lambda \xaf$_{I})
. | $\chi S$ ($\lambda \xaf$_{II})
. | $\chi S$ ($\lambda \xafS$) . | P relvar . |
---|---|---|---|---|---|

I | 2.31, 0.89, 0.38, 0.43 | 1.15 | |||

II | 1.0, 1.0, 1.0, 1.0 | 0.40 | 0.25 | 0.25 | 0.27 |

III | −10.1, 13.4, 0.54, 0.12 | 0.4274 | 0.1739 | 0.0024 | 0.0051 |

IV | −13.4, 17.6, 0.298, −0.43 | 0.0378 | 0.0158 | 0.0004 | 0.0062 |

V | −0.97, 2.55, 1.93, 0.49 | 0.1718 | 0.0435 | 0.0069 | 0.062 |

VI | −0.48, 1.93, 1.87, 0.68 | 0.0152 | 0.0040 | 0.0011 | 0.068 |

VII | −0.98, 2.6, 1.93, 0.47 | 0.5928 | 0.2090 | 0.0375 | 0.062 |

VIII | −0.47, 1.93, 1.87, 0.67 | 0.0524 | 0.0190 | 0.0060 | 0.068 |

Scheme $S$ . | Weights $\lambda \xafS$ . | $\chi S$ ($\lambda \xaf$_{I})
. | $\chi S$ ($\lambda \xaf$_{II})
. | $\chi S$ ($\lambda \xafS$) . | P relvar . |
---|---|---|---|---|---|

I | 2.31, 0.89, 0.38, 0.43 | 1.15 | |||

II | 1.0, 1.0, 1.0, 1.0 | 0.40 | 0.25 | 0.25 | 0.27 |

III | −10.1, 13.4, 0.54, 0.12 | 0.4274 | 0.1739 | 0.0024 | 0.0051 |

IV | −13.4, 17.6, 0.298, −0.43 | 0.0378 | 0.0158 | 0.0004 | 0.0062 |

V | −0.97, 2.55, 1.93, 0.49 | 0.1718 | 0.0435 | 0.0069 | 0.062 |

VI | −0.48, 1.93, 1.87, 0.68 | 0.0152 | 0.0040 | 0.0011 | 0.068 |

VII | −0.98, 2.6, 1.93, 0.47 | 0.5928 | 0.2090 | 0.0375 | 0.062 |

VIII | −0.47, 1.93, 1.87, 0.67 | 0.0524 | 0.0190 | 0.0060 | 0.068 |

Figure 16(c) shows that scheme III has failed to capture what we truly want (scheme IV's stack is similar). Its noise level is high. We ascribe this to the denominator of criteria χ_{III} and χ_{IV} scaling with the sum of the squares of the λ. This may be contrasted with the denominators in the other schemes that scale more or less with the square of the sum. Inasmuch as the finite-T-noise's estimated variance [see Eq. (A15)] scales with the sum of the squares of the weights λ, while the signal X itself scales with the sum of the λ, we see that schemes III and IV may, by increasing their denominators, inappropriately seek to increase that noise at the expense of signal. This is what has happened here. Generically we can expect this to be an issue when the weights λ include some significant negative values; it is then that the sum of the squares will differ most from the square of the sum. It is, however, striking that schemes III and IV, while producing X with excessive finite-T noise, nevertheless do the best jobs of finding weights that minimize P's relative variance.

Also noteworthy is that the weights chosen by the two schemes V and VII that emphasize symmetry are very nearly the same, while the weights chosen by VI and VIII which emphasize causality are very nearly the same. All four sets are fairly close, and result in very similar net ponderosities P (see Fig. 17).

This picture is supported by the more detailed Figs. 18–21 that show X_{26} as obtained using schemes I, II, VIII, and III, respectively. We find that scheme VIII does a good job of eliminating acausal noise (as do schemes V–VII); predicted finite-T fluctuations are comparable to the observed levels. This is ascribable to how they have converged on a set of weights that makes P nearly constant (see Fig. 17). Interestingly, schemes III and IV also choose weights that give a good variance to P (even better than V–VIII; see Table III) but they yield plots [see Fig. 16(c) or Fig. 21] in which there is large noise, a noise level ascribable to finite-T fluctuations. They have chosen good P, but at the cost of large finite-T related residual noise.

This confirms the idea that for this case, and in spite of their minimal χ, schemes III and IV do a poor job of improving total signal to noise. The signature of this failure is the large negative weights, as the Eq. (A15) predicts poor signal to finite-T-noise when that is the case. Because schemes V–VIII are proper signal to noise ratios, that more closely capture what the eye demands than the noise-to-noise ratios III and IV, we find that they strike a more proper balance between minimizing finite-T noise and acausal or antisymmetry noise. Nevertheless, because schemes V–VIII necessarily require negative weights, and in spite of how they decrease total noise, they increase the contribution of finite T noise over that of conventional stacks.

## VI. CONCLUSIONS AND DISCUSSION

It is found that the proposed optimization schemes do indeed improve the quality of the resulting stacks. It is particularly noteworthy that they choose daily weights that generate more isotropic P. They do so despite not having been provided information on ponderosities, only on the raw correlations. Furthermore, whether we optimize on causality or symmetry, we find similar weights *λ* and similar stacks X_{ij}(*τ*) and similar P.

We have found that schemes III and IV are less reliable than the other optimizing schemes (V–VIII). The denominator of their criteria *χ,* $\lambda \xaf$^{T}$\lambda \xaf$, is proportional to finite-T noise [Eq. (A15)], so minimizing these χ can favor large finite-T-noise. In case C above it was seen how this proved very unhelpful. Nevertheless, it is remarkable how even when they produce an X with excessive finite-T noise, they chose weights λ that correspond to the most isotropic P. The other schemes (V–VIII) have been found to do a good job in all three cases examined. They call for weights that correspond to effective total ponderosities P that are close to isotropic, while not excessively increasing finite T noise. Whether optimizing on symmetry or causality, they choose similar weights and achieve similar empirical signal to noise ratios. One notes in particular that schemes VI and VIII minimize the ratio of empirical noise—as ascertained by examining fluctuations in X at pre-causal times $C$—to signal—as represented using as proxy either the total array energy A_{X2} or the total array energy $E$_{X}. Similarly, schemes V and VII can be understood as minimizing the ratio of empirical noise (as measured using X's antisymmetry) to signal. As such any of the schemes V–VIII is arguably an appropriate way to minimize empirical signal to noise ratio.

Depending on interest, one could consider modifications. Perhaps one could weight different ranges of lapse time more or less in $C$ and $S$, or some of the sensor pairs (i j) could be weighted more heavily than others. If significant coherent body wave amplitude is expected to arrive before surface waves, the definition [Eq. (9)] of acausality noise $C$ may need to be rethought. One might also consider symmetrizing waveforms c(τ) before constructing X, and then optimizing on causality. Finally, if one is willing to forego the ability to make meaningful comparisons of amplitudes on different sensors and in particular to forgo retrieval of attenuation, one could apply the suggestion of Bensen *et al.* (2007) of local in time and space normalizations and/or allow the weights λ themselves to vary with sensor or sensor pair as well as time.

There is a choice of the number D of blocks of time into which a months-long record could be subdivided. Choosing a large number D of short blocks, or even making λ continuous, might seem advantageous, as it would allow the schemes to accommodate ponderosities that changed rapidly in time. If D is large enough, however, the correspondingly large number of degrees of freedom in $\lambda \xaf$ could generate almost any X(*τ*) consistent with the range of frequencies passed by the filters; in consequence there would be no reason to respect a scheme's stack, regardless of how symmetric it was in *τ* or how clean it looked at pre-casual times. One therefore asks what the limits are on D that will prevent such meaningless minimizations? The issue has not been explored in our numerical experiments but there is room for some speculation. The number of degrees of freedom in the precausal times of a stack used to evaluate $C$ may be estimated as the amount of precausal time (being the average sensor separation times the number of ordered pairs divided by the speed) divided by the duration Δ of the basic wavelet, equal to the inverse of the filter bandwidth [see Eq. (A7)]. This ratio is about 200 in the above numerical experiments. The number of degrees of freedom in the antisymmetric part of X used to evaluate $S$ is somewhat more. It is speculated that any choice of D that is significantly less than these numbers will serve, leaving the schemes too little freedom with which to generate meaningless minimizations in acausality $C$ or antisymmetry $S$. Optimal D is unlikely to be limited by the computational burden of solving the matrix equations such as Eqs. (14) and (16) or the search for the leading eigenvalue or generalized eigenvalue of the D × D matrices *M ^{S} or M^{C}.* Such operations are straightforward and their computational burden scales only with the square of the number of days D.

We make further comments in regard to application to long-period seismology. The cases examined here were all for a multiply scattering medium. Such media are especially sensitive to spurious arrivals due to non-isotropic illumination. The performance of the proposed schemes in non-scattering media remains to be explored. The observation is well established that scatterers generate non-causal and nonsymmetric spurious features if the incident noise field is not isotropic. However, even a fully diffuse incident field can generate spurious features if the scatterers are absorbing. Inasmuch as a scatterer of seismic surface waves will likely scatter not only into other surface waves but also into body waves that mimic losses, one might expect that even an effective isotropic ponderosity P can generate spurious features. This will degrade the quality of the stacks X resulting from the above schemes. Whether this will prove to be a serious issue in seismic practice remains to be seen. It is further noted that long-period ambient seismic noise is usually highly anisotropic, as in case C with different days' intensities having similar dominant directions. While p(θ) can be expected to vary over the seasons, it may well be that generating a near isotropic P requires that some blocks of time contribute with large negative weights *λ*. This will augment the finite-T noise and limit the quality of the resulting stacks X, unless exceptionally long T is available. Exploration of these issues will require further numerical experiments and application to actual Earth data.

## ACKNOWLEDGMENTS

We thank Xiaodong Song for helpful discussions. This work was supported in part by a Contract No. AFRL FA9453-15-C-0067.

### APPENDIX

We consider the finite-T contribution to error in the X(t) as defined in Eq. (4). This is not an analysis of the difference between X and G, but rather between an X as constructed using finite time T and the X one would have if T could be taken to infinity. We start with the definition of one day's unnormalized empirical correlation [Eq. (1)]

This has an expectation

where Ψ is the expectation ⟨*ψψ*⟩. We quickly derive, using Eq. (3), that E^{d} has expectation T_{d}Σ_{i}Ψ_{ii}^{d}(0).

The day's correlation c^{d} has a variance

In the usual procedure for centered stationary Gaussian processes we may evaluate the fourth order moment using cumulants, with first, third, and fourth order cumulants vanishing:

The first and fourth terms cancel. This leaves

Unless i = j, the second term involving cross correlations is in practice much less than the first involving autocorrelations. We therefore neglect it and obtain

The integral converges in a time of order one inverse bandwidth. The autocorrelations $\Psi iid$(*τ*) may be written as $\epsilon id$g(*τ*), where g is the inverse Fourier transform of the filter, normalized to unity at *τ* = 0, and $\epsilon id$ is the i-th sensor's contribution to E_{d}: E_{d} = T_{d}Σ_{i}$\epsilon id$. g is a tone burst with a central frequency specified by the filter and a duration proportional to the inverse of the filter's bandwidth. We define the filter duration Δ = ∫g^{2}(*τ*)d*τ*. Then

This provides an estimate of the mean square deviation between an empirical c_{ij} [Eqs. (A1) and (A2) above] and its T → ∞ limit. The expression is applicable regardless of whether or not Ψ_{ii} is dominated by local noise. It does, however, depend on the assumption that the day's noise intensity was stationary. Clearly the ratio of Signal [Eq. (A2)] to (this contribution to) noise $A7$ scales with $Td$. Summing over all i and j (and neglecting that the expression at i = j is somewhat erroneous) we find

The square root of this is a measure of rms global Noise associated with finite T.

We can similarly define a global measure of correlation amplitude as

Thus a single day's ratio of global Signal to global Noise is [T_{d}/Δ]^{1/2}. This derives the familiar $T$ scaling of amplitude signal to noise ratio.

We now turn our attention to the signal to noise ratio in the weighted stack X, as defined in terms of the weights *λ* and the unnormalized noise signals *ψ*,

It has expectation

It has variance (where we use the lack of correlations between different days' finite-T fluctuations)

As in Eq. (5) we define a global measure of the signal level in X as its expected amplitude

We define a global measure of the square of the noise by

The global ratio of signal energy $AX2$ to the variance of finite-T noise is then

It is straightforward to show that this is maximal when the *λ*_{d} are proportional to the T_{d}. In the special case considered in the numerical simulations above that each day has the same length, we conclude that this signal to noise ratio is maximal when all the *λ* are equal. Hence, the assertion in Sec. III that temporal flattening minimizes the finite-T-noise to signal ratio. Of course, other contributions, such as spurious features, to differences between X and G may be minimized by different choices for the *λ*.

^{1}

See, in particular, Sec. 9.3 on the Virtual Crystal and Average T matrix Approximations. For small s and long wavelength the ATA predicts amplitude attenuation of a mean plane wave to be *α* = *ρ*s^{2}/8|*ω*|.