Successful synthetic aperture sonar target classification depends on the “shape” of the scatterers within a target signature. This article presents a workflow that computes a target-to-target distance from *persistence diagrams*, since the “shape” of a signature informs its persistence diagram in a structure-preserving way. The target-to-target distances derived from persistence diagrams compare favorably against those derived from spectral features and have the advantage of being substantially more compact. While spectral features produce clusters associated to each target type that are reasonably dense and well formed, the clusters are not well-separated from one another. In rather dramatic contrast, a distance derived from persistence diagrams results in highly separated clusters at the expense of some misclassification of outliers.

## I. INTRODUCTION

This article discusses target classification from synthetic aperture sonar collections in various clutter contexts by analyzing target echo structure through the lens of topological signal processing. Previous work, as early as in 1996,^{1} suggests that successful classification depends on the “shape” of the scatterers within a target signature. Although “shape” is something of an amorphous concept, it is generally agreed^{2} that multiple feature sets are necessary to provide adequate classification performance. Apart from the statistical necessity of class-specific feature sets,^{3} we hypothesize that the “shape” of a target is a combination of the topology and geometry of the *space of individual pulse echos*. We might therefore study pulse *topological subspaces* directly, rather than forming images or through vector space projections onto feature spaces. We present a method that uses *persistence diagrams* (Sec. III B 2) derived from the space of echos and two distance metrics between persistence diagrams (Sec. III B 3) to provide a target-to-target distance that is informed by the shape of their signatures.

The topological target-to-target distance is large for very different targets but small for targets that are closely related (Sec. IV B). The topological target-to-target distance compares favorably against the Euclidean and correlation pseudodistance on spectral features and has the advantage of being substantially more compact. While spectral features produce clusters associated to each target type that are reasonably dense and well formed, the clusters are not well-separated from one another. The topological target-to-target distance results in highly separated clusters at the expense of some misclassification of outliers. These intuitions are confirmed by hierarchical clustering analysis in Sec. IV C. We found that while the impact of topology and geometry is decisive in clustering target types correctly, the necessary topology is rather simple in nature. To a good approximation, the *dendrogram of the sonar pulses themselves in a single collection* of a target appears to be characteristic of that target.

The literature on sonar target classification techniques is large, and state-of-the-art feature extraction algorithms are quite specialized (see Sec. II A for a relevant summary). We chose to perform our comparison with a simple, easy-to-characterize methodology based on Fourier methods rather than a newer method that is more dependent on careful training. We discovered several qualitative and methodological differences between topological and traditional methods that are worthy of note.

Topological methods appear to be less susceptible to overfitting. They provide better hierarchical clustering results (Table III in Sec. IV C), but this comes at the expense of somewhat lower overall classification accuracy (Table I in Sec. IV B).

Topological methods are substantially more complicated to implement (Sec. III B), though good software packages are available that encapsulate much of this complexity.

Topological features are more awkward to manipulate once computed since instead of being vectors in a feature space, they

*are vector spaces*themselves (Sec. III B 3).Statistical inference with topological methods is currently immature, so we are limited to simpler cluster separation analysis.

. | . | . | . | Alg. 1 . | Alg. 1 . | Alg. 1 . | Alg. 1 . | Usual . | Usual . | Usual . | Usual . |
---|---|---|---|---|---|---|---|---|---|---|---|

. | Spectral . | Spectral . | Tucker and Srivastsva . | H_{0}
. | H_{1}
. | H_{0}
. | H_{1}
. | H_{0}
. | H_{1}
. | H_{0}
. | H_{1}
. |

Group . | L^{2}
. | corr. . | [Ref. 12, Fig. 5(b)] . | L^{2}
. | L^{2}
. | corr. . | corr. . | L^{2}
. | L^{2}
. | corr. . | corr. . |

TARGETA1 | 0.086 | 0.163 | 0.489 | 0.373 | 0.082 | 0.029 | 0.024 | 0.134 | 0.165 | 0.015 | 0.073 |

TARGETA2 | 0.091 | 0.114 | 0.678 | 0.158 | 0.058 | 0.005 | 0.010 | 0.055 | 0.054 | 0.002 | 0.073 |

TARGETA3 | 0.086 | 0.224 | 0.530 | 0.239 | 0.087 | 0.147 | 0.106 | 0.079 | 0.073 | 0.003 | 0.301 |

TARGETB1 | 0.101 | 0.224 | 0.595 | 0.241 | 0.034 | 0.443 | 0.326 | 0.364 | 0.082 | 0.013 | 0.299 |

TARGETB2 | 0.091 | 0.114 | 0.545 | 0.374 | 0.040 | 0.086 | 0.076 | 0.128 | 0.054 | 0.004 | 0.059 |

TARGETB3 | 0.213 | 0.253 | 0.549 | 2.914 | 0.026 | 0.086 | 0.135 | 0.079 | 0.040 | 0.027 | 0.276 |

TARGETB4 | 0.206 | 0.253 | 0.557 | 1.661 | 0.026 | 0.129 | 0.076 | 0.089 | 0.050 | 0.005 | 0.059 |

TARGETC1 | 0.157 | 0.336 | 0.686 | 1.239 | 0.181 | 0.129 | 0.206 | 0.203 | 0.152 | 0.032 | 0.515 |

TARGETC2 | 0.266 | 0.300 | 0.527 | 1.374 | 0.180 | 0.272 | 0.201 | 0.134 | 0.076 | 0.019 | 0.638 |

TARGETD1 | 0.086 | 0.258 | 0.598 | 0.157 | 0.040 | 0.005 | 0.013 | 0.055 | 0.040 | 0.003 | 0.096 |

TARGETD2 | 0.136 | 0.158 | 0.496 | 0.158 | 0.059 | 0.030 | 0.037 | 0.057 | 0.064 | 0.002 | 0.290 |

Group A | 0.086 | 0.110 | 0.413 | 0.007 | 0.008 | 0.001 | 0.002 | 0.033 | 0.034 | 0.002 | 0.044 |

Group B | 0.091 | 0.110 | 0.462 | 0.011 | 0.006 | 0.024 | 0.025 | 0.048 | 0.026 | 0.004 | 0.056 |

Group C | 0.157 | 0.287 | 0.446 | 0.058 | 0.030 | 0.024 | 0.039 | 0.080 | 0.049 | 0.019 | 0.237 |

Group D | 0.086 | 0.151 | 0.420 | 0.007 | 0.006 | 0.001 | 0.002 | 0.033 | 0.026 | 0.002 | 0.044 |

Overall types | 0.086 | 0.114 | 0.489 | 0.157 | 0.026 | 0.005 | 0.013 | 0.055 | 0.040 | 0.002 | 0.059 |

Overall groups | 0.086 | 0.110 | 0.413 | 0.007 | 0.006 | 0.001 | 0.002 | 0.033 | 0.026 | 0.002 | 0.044 |

. | . | . | . | Alg. 1 . | Alg. 1 . | Alg. 1 . | Alg. 1 . | Usual . | Usual . | Usual . | Usual . |
---|---|---|---|---|---|---|---|---|---|---|---|

. | Spectral . | Spectral . | Tucker and Srivastsva . | H_{0}
. | H_{1}
. | H_{0}
. | H_{1}
. | H_{0}
. | H_{1}
. | H_{0}
. | H_{1}
. |

Group . | L^{2}
. | corr. . | [Ref. 12, Fig. 5(b)] . | L^{2}
. | L^{2}
. | corr. . | corr. . | L^{2}
. | L^{2}
. | corr. . | corr. . |

TARGETA1 | 0.086 | 0.163 | 0.489 | 0.373 | 0.082 | 0.029 | 0.024 | 0.134 | 0.165 | 0.015 | 0.073 |

TARGETA2 | 0.091 | 0.114 | 0.678 | 0.158 | 0.058 | 0.005 | 0.010 | 0.055 | 0.054 | 0.002 | 0.073 |

TARGETA3 | 0.086 | 0.224 | 0.530 | 0.239 | 0.087 | 0.147 | 0.106 | 0.079 | 0.073 | 0.003 | 0.301 |

TARGETB1 | 0.101 | 0.224 | 0.595 | 0.241 | 0.034 | 0.443 | 0.326 | 0.364 | 0.082 | 0.013 | 0.299 |

TARGETB2 | 0.091 | 0.114 | 0.545 | 0.374 | 0.040 | 0.086 | 0.076 | 0.128 | 0.054 | 0.004 | 0.059 |

TARGETB3 | 0.213 | 0.253 | 0.549 | 2.914 | 0.026 | 0.086 | 0.135 | 0.079 | 0.040 | 0.027 | 0.276 |

TARGETB4 | 0.206 | 0.253 | 0.557 | 1.661 | 0.026 | 0.129 | 0.076 | 0.089 | 0.050 | 0.005 | 0.059 |

TARGETC1 | 0.157 | 0.336 | 0.686 | 1.239 | 0.181 | 0.129 | 0.206 | 0.203 | 0.152 | 0.032 | 0.515 |

TARGETC2 | 0.266 | 0.300 | 0.527 | 1.374 | 0.180 | 0.272 | 0.201 | 0.134 | 0.076 | 0.019 | 0.638 |

TARGETD1 | 0.086 | 0.258 | 0.598 | 0.157 | 0.040 | 0.005 | 0.013 | 0.055 | 0.040 | 0.003 | 0.096 |

TARGETD2 | 0.136 | 0.158 | 0.496 | 0.158 | 0.059 | 0.030 | 0.037 | 0.057 | 0.064 | 0.002 | 0.290 |

Group A | 0.086 | 0.110 | 0.413 | 0.007 | 0.008 | 0.001 | 0.002 | 0.033 | 0.034 | 0.002 | 0.044 |

Group B | 0.091 | 0.110 | 0.462 | 0.011 | 0.006 | 0.024 | 0.025 | 0.048 | 0.026 | 0.004 | 0.056 |

Group C | 0.157 | 0.287 | 0.446 | 0.058 | 0.030 | 0.024 | 0.039 | 0.080 | 0.049 | 0.019 | 0.237 |

Group D | 0.086 | 0.151 | 0.420 | 0.007 | 0.006 | 0.001 | 0.002 | 0.033 | 0.026 | 0.002 | 0.044 |

Overall types | 0.086 | 0.114 | 0.489 | 0.157 | 0.026 | 0.005 | 0.013 | 0.055 | 0.040 | 0.002 | 0.059 |

Overall groups | 0.086 | 0.110 | 0.413 | 0.007 | 0.006 | 0.001 | 0.002 | 0.033 | 0.026 | 0.002 | 0.044 |

## II. HISTORICAL CONTEXT

This article draws upon three separate threads of research in the literature: (1) sonar classification, (2) topological data analysis, and (3) cluster separation statistics. This article is apparently the first systematic study that explores how topological features compare with more traditional sonar features.

### A. Sonar classification

Sonar target classification has been discussed extensively in the literature for many years. The usual approach is to distill the sonar echos into a vector of *features* that lie in a vector space, and then to classify based these features. Before briefly recounting the kinds of features that one might produce, “the general consensus seems to be that there is no best feature for a problem.” ^{4} The feature vectors may at least need to be aware of environmental effects, such as the interaction of a target with the bottom.^{2} For instance, shadows can enhance classification performance.^{5}

When a given sonar echo comes from one of two possible classes, one might hope that a single sufficient statistic would discriminate between them. But under realistic conditions, it can happen that “no single minimal sufficient statistic exists for testing between two hypotheses *H*_{1} and *H*_{2},” being the two sonar target classes, “whereas PDF projection (then known as class-specific features) results in an optimal test.”^{3,6} One theoretically and practically sound perspective is therefore that one should employ several classes of specific features.^{3,6–9} Mathematically, class-specific features are an expression of sparsity within an overcomplete dictionary.^{10,11}

To baseline our performance against feature-based methods, there are many possible sets of features. Without claiming to be exhaustive, one might consider the following.

#### 1. Spectral methods

Good separation of target spectrograms can be afforded by slow-time warping.^{12} One may also use statistical signal processing of spectral data to identify resonances.^{13–16} Multiple waveforms^{17} or nonlinear processing,^{18} can produce useful spectral information about a target. Various bio-inspired waveform-diverse methods have also been tried.^{19,20}

#### 2. Wavelets

#### 3. Neural networks

Surprisingly good results can be obtained using neural networks trained to classify targets from their sonar echos.^{28} It seems that “the [trained] network's generalization strategy is dependent upon the relative frequency and the stability of features in the input pattern.” Subsequent works^{29–31} explored various incarnations of neural networks. The interested reader should consult the detailed summaries.^{32,33}

#### 4. Image-derived features

Various authors have attempted to build classification methods around the spatial structure of a sonar target. For instance, one can build a model of a target based on learning the locations and reflectivities of prominent point scatterers. These parameters can be effective as features for a classifier.^{1} Various others^{34–40} have taken this idea further to classify based on a sub-image formed using backprojection, MUSIC, or some other method. It seems that all of these methods rely on the fact that space of parameters one needs to learn is both generic and of low enough dimensionality.^{41} Recently, image-derived features yielded good within-group classification amongst a set of different classes of targets.^{42,43}

### B. Topological data analysis

The use of topology to distill salient features from datasets grew out of dimension reduction and manifold learning methods.^{44–47} Although *homology*^{48} is easy to compute, it is not particularly robust. A key breakthrough was the discovery of a robust version,^{49} *persistent homology*, that could be computed efficiently.^{50} After this discovery, there was a burst of activity, summarized in several influential surveys.^{51–55} Recently, persistent homology has begun to penetrate into signal processing practice as well.^{41,56–63}

Although it appears that this article presents the first application of persistent homology to synthetic aperture sonar data, our workflow mirrors the one developed in Ref. 64 closely. We take the analysis further, though, and we perform a cross-validation study to measure classification performance. Since a synthetic aperture sonar collection can be realized as a geometric graph, we exploit the recent findings that a space of graphs can be endowed with various metrics derived from persistent homology.^{65–67}

### C. Cluster separation performance

In this article, we perform an analysis of cluster separation in Sec. IV B. Various measures of cluster separation have been studied extensively in the literature,^{68–73} and several informative surveys^{74–76} have been written about them. Nearly all known measures require the computation of a cluster center. Although cluster center computation is possible for persistence diagrams,^{77} it could introduce additional errors that are difficult to disentangle from the rest of our workflow. We therefore used the Dunn index,^{78} which appears to be the only standard cluster separation measure that relies *only* on metric information but not on cluster centers.

## III. METHODOLOGY

A sonar collection system (the *sensor*) emits a sequence of acoustic *pulses* toward a scene containing a single reflective object (the *target*) in some environmental context, from which the system later records echos. We assume that each imaging event of this reflective object consists of *n* pulses, each of which are discretized into a sequence of *m range samples*. When the same physical object appears in multiple scenes (often in a different location or orientation), we will say that the targets have the same *target type*. On the other hand, a related collection of target types will be called a *target group*. (The specific target types and groups present in the data described in this article are listed in Sec. IV.)

*Classification* refers to the task of assigning the correct target type or target group to a given target. Since the set of pulses collected by the system typically requires preprocessing before classification, it is useful to refer to the processed set of pulses as a *signature*. One usually desires signatures that are at least insensitive to *cross-range*, the position of the sensor along its track. In this article, two different kinds of signatures are studied: those from spectra and those from persistent homology, from which we derive several possible *signature pseudometrics*. Using such a pseudometric, the set of all pairwise distances between target signatures comprises a *target-to-target distance matrix*. It is important to recognize that a given target-to-target distance matrix depends both on the signature *and* its pseudometric. We will assess the performance of both by examining properties of their associated target-to-target distance matrix, by comparing distances between targets within the same target type (or group) and between targets of different types (or groups).

A typical target is shown in Fig. 1(a) in which the vertical axis corresponds to pulses and the horizontal axis corresponds to range samples. The brightness of a given pixel indicates its relative signal strength on a logarithmic scale, with white being the strongest return and black being the weakest. The particular sensor used in this article slides along a rail past the target, so the range to target depends on cross-range. The targets are normalized so that the point of closest approach is always at the middle pulse and the minimum range to the target is the same (10 m, occurring around range sample 1350).

### A. Baseline approach: Target-to-target distances from 2d power spectra

Sonar signatures should be invariant to range and cross-range, since targets of the same type at different locations in the scene should look the same to the classifier. Using 2d power spectra instead of pulse timeseries enforces translation invariance both in range and in cross-range. The spectrum of a typical target is shown in Fig. 1(b), in which the vertical axis corresponds to Doppler frequency and the horizontal axis corresponds to frequency within a pulse. The complete translation invariance afforded by power spectra comes at a cost, because it ignores the phase of the Fourier transform of the data. Our results (see Sec. V) indirectly suggest that phase may be useful in classifying sonar targets.

Cross-range alignment is particularly important since a typical target consists of a small subset of all aspect angles, and the sensor has no control over the placement of the object in the scene. For the sensor in our dataset, the pulses are only *approximately* equally spaced in angle. To correct for this, one really needs to perform a polar format, backprojection, or some other kind of motion compensation algorithm before computing the 2d power spectrum.^{79} However, since these algorithms require precise trajectory information that might not be available, Tucker and Srivastava^{12} proposed an autofocus-like approach to resample the pulses according to an optimization. Because we used the same dataset as Ref. 12 in our analysis in Sec. IV, we will exhibit results both with and without the compensation applied using the algorithm of Tucker and Srivastava.

The baseline approach constructs a *2d power spectrum* (briefly, *spectrum*) matrix for each target, which has *n* rows and *m* columns. Given two spectra *r* = (*r*_{j,k}) and *s* = (*s*_{j,k}) associated to different targets, consider two distinct pseudometrics to measure the distance between them, the *spectral L*^{2} *metric*

and the *spectral correlation pseudometric*

where

### B. Proposed approach: Topological signatures from pulse geometry

The baseline approach in Sec. III A treats the set of pulses from a target as an atomic unit, which ignores geometric structure *amongst* the pulses. If we consider the geometry of the set of pulses within a target *as a signature*, distortions due to angular uncertainty can be avoided without resorting to complete invariance. To measure this geometry, we must select a *pulse-to-pulse pseudometric*. Although a pulse-to-pulse pseudometric directly supplies geometry on the pulses, we will use it to construct the pseudometric on the space of signatures. To avoid putting undue rigidity upon the space of signatures, we prioritize the *topology* of the collection of pulses within a target. However, we believe it is a mistake to focus on the topology of the target signature to the exclusion of geometry. *Persistence diagrams* form a distinct signature class that capture both geometric and topological features, and admit several useful signature metrics for a classifier.

Our methodology is summarized in the workflow diagram in Fig. 2, which shows two processes.

For each target:

*Persistent homology*for feature extraction (Sec. III B 2).**Input:**A target consisting as a set of pulses, as defined above in Sec. III.**Output:**A*persistence diagram*.

For each pair of target persistence diagrams:

*Persistence distance*(Sec. III B 3).**Input:**Persistence diagrams, one for each signature.**Output:**A target-to-target*distance matrix entry*.

#### 1. Pulse-to-pulse distances

In contrast to the approach described in Sec. III A, in which the *L*^{2} metric and correlation pseudometric are applied to pairs of target spectra, in persistent homology the metrics are applied to *individual pulses*. We assume that each sonar pulse consists of a vector in $\mathbb{R}m$. We have analyzed two distinct pseudometrics for $p,q\u2208\mathbb{R}m$, the *L*^{2} *metric*

and the *correlation pseudometric*

where *R _{s}* rotates a vector by

*s*components, and $\Vert p\Vert 2=d2(p,0)$ is the

*L*

^{2}norm. Which of these should be used depends on how the pulse data are formatted. If the data are formatted so that components of a vector $p\u2208\mathbb{R}m$ are range samples, then the correlation pseudometric is range insensitive.

Given a timeseries of pulses for a target in a *pulse-to-pulse distance matrix* $s=(si,j)\u2208\mathbb{R}nm$ with *n* rows (pulses) and *m* columns (samples in range or frequency), we can organize the set of all possible pulse-to-pulse distances in a square matrix

where *k* = 2 or *c* and $si,\u2022$ represents the *i*th row of *s*.

Figure 3 shows simulated data for two sonar scenarios, one with a spherical target of radius 0.1 m and one with an ellipsoidal target with major axis 1 m and minor axis 0.1 m. The sensor recorded 75 evenly spaced pulses at a sample rate of 44.1 kHz around one circular orbit of each target. Both targets were illuminated at a range of 12.5 m by sinc-shaped pulses with 8 kHz bandwidth.

Using the *L*^{2} metric on pulses, the pulse-to-pulse distance matrix for both simulated targets is shown in Fig. 4.

The pulse-to-pulse distance matrix endows the space of pulses with an abstract geometry that is embedded in a high dimensional space, whose dimension is the number of samples recorded in each pulse. Since this is difficult to visualize, Fig. 5 shows a principal components analysis (PCA) projection of the space of pulses for these two targets. It is immediately clear that they have rather different geometric structure; the goal of this article is to exploit these differences for classification.

#### 2. Persistence diagrams

A *persistence diagram* is a scatter plot in the plane where each point represents a void in the space of interest. Persistence diagrams are *graded* by *degree*, which specifies the dimension of the voids under discussion. The coordinates of each point represent the distance thresholds over which the void is present. The horizontal axis shows its *birth*, the distance threshold it comes into existence, while the vertical axis shows its *death*, the threshold at which void is no longer present. We used the persistent homology software called perseus^{80,81} to compute persistence diagrams in degree 0 and 1 for each target in this article. (More recent persistent homology tools are also available and may provide better computational performance than perseus, for instance, Refs. 82–84, though this list is not exhaustive.)

Figure 6 shows persistence diagrams for the simulated spherical and ellipsoidal targets. The persistence diagrams associated to the spherical target are marked with circles, while the diagrams associated to the ellipsoidal target are marked with stars. The shaded regions indicate that the birth of a void always comes before its death. The left frame (labeled *H*_{0}) corresponds to 0-dimensional voids, which are connected components of the space. The right frame (labeled *H*_{1}) corresponds to 1-dimensional voids, which are loops in the space. (In our data, higher dimensional voids were not useful for classification.) Since the set of points corresponding to the spherical target (circles) is far from the set of points corresponding to the ellipsoidal target (stars), we should conclude that the two targets are indeed different. We quantify this difference in Sec. III B 3.

#### 3. Target-to-target distances from persistence diagrams

In order to use persistence diagrams in a classification algorithm, we must quantify the distance between the persistence diagrams. Figure 7 shows a typical comparison between the persistence diagrams of two targets, TARGETB2 and TARGETB3. (In some cases, the persistence diagrams for a given pair of targets are completely overlapped.) The figure uses different markers (dot, circle, cross, star, diamond) for the five different views of each target (see Sec. IV). It is immediately clear that the points in the persistence diagrams for a given target are localized to a definite portion of the plane, determined by the radius of the space of pulses. Although classifying based on radius alone clearly cannot result in good classification performance, it is important.

In this article, we consider two distances: (1) the *p*-Wasserstein metric in definition 1 and (2) a metric defined in algorithm 1 (based on the stability results proven in Ref. 44) that is more attuned to the variability we observed in acoustic data.

##### a. Definition 1.

(A slight modification of what appears in Ref. 77, which is itself an extension of Ref. 49.) The *p-Wasserstein metric* between two persistence diagrams *X*, *Y* (as multisets of points in the plane including the diagonal) is given by

where *γ*: *X* → *Y* ranges over all possible bijections. We interpret each point in a multiset that has a multiplicity greater than 1 to consist of multiple distinct copies of that point.

This distance between two diagrams can be interpreted as the smallest amount that one needs to thicken one diagram so that each point of the other diagram is inside one of the thickened points. The role of the bijection is to specify which pairs of points should be compared, so in practice, one must iterate over the set of bijections from one persistence diagram to the other.

The *p*-Wasserstein metric is typically used in topological data analysis since it enjoys a robustness theorem^{49,77} for small perturbations in the distribution of possible pulses. As a result, the theoretical robustness results proven in Refs. 49 and 77 assume that the diagonal is part of every persistence diagram. This is a very strong robustness result and is essentially too strong for our purposes because an arbitrarily small perturbation in the distribution of pulses means that any echo (at all!) might be received, provided it does not happen too often. Sonar systems expect noiselike perturbations in the range samples, but not the distribution of pulse echos.

We found that the *p*-Wasserstein metric did not perform well on our data (Sec. IV), in that the clustering performance was often weaker than state-of-the-art methods. The difficulty with the *p*-Wasserstein metric can be explained by referring to Fig. 7. All of the points in TARGETB2's diagram are closer to the diagonal than they are to the points in TARGETB3's diagram. Thus, the *p*-Wasserstein metric will preferentially match these points to the diagonal, resulting in a fairly small distance. This effectively makes the assumption that all of the points in both targets' diagrams are more likely due to noise than a true difference between the diagrams. However, as evidenced by the fact that Fig. 7 contains a superposition of *all* available data for both target types, this assumption is quite untrue! (This phenomenon is also visible in the simulated example, Fig. 6.)

To remedy this deficiency, we drew insight from the Critical Point Stability Theorem proved in Ref. 44 (theorem 3.4). In our context, this means that points in a persistence diagram are stable under small perturbations of the range samples within a pulse. Small perturbations cannot move points off of the diagonal of a persistence diagram unless the curvature of the underlying space is too large. This becomes a problem when the performance of the sonar system is limited by noise. The sonar system used in collecting the PONDEX10 data presented in Sec. IV was not limited by noise in this way.

Estimating the *p*-Wasserstein metric from a random sample of bijections is fraught with difficulty for two related reasons: (1) the set of bijections can be quite large, and (2) estimating the infimum of a set is quite difficult. Although it is possible to compute the *p*-Wasserstein metric geometrically,^{85} we found that these convergence issues did not arise in our processing when we used a sample of randomly selected bijections. Our Algorithm 1 therefore uses randomly selected bijections. Figure 8 shows that after about 20 bijections are used, the mean difference between the estimates of the algorithm 1 distance stabilizes. Additionally, we found that estimating either distance (the *p*-Wasserstein metric or algorithm 1) using 20 random bijections was considerably faster than the exact geometric computation. The sampling estimate ran in a matter of seconds while the exact calculation took minutes.

Given these observations, we developed a new distance that is computed according to the following algorithm.

##### b. Algorithm 1.

Assume that the persistence diagrams for two targets have *m* and *n* off-diagonal points, and without loss of generality that *m* < *n*. Select the minimum distance from *N* iterations of the following.

Construct a random permutation [

*σ*_{1},…,*σ*] of_{n}*n*items.For each

*k*= 1,…,*m*, compute the distance between point*k*from the first diagram and point*σ*from the second._{k}For each

*k*=*m*+ 1,…,*n*, compute the distance from point*k*in the second diagram to the diagonal.Sum over the distances computed in steps (2) and (3).

To be quite clear, algorithm 1 is *not* the 1-Wasserstein distance of the two persistence diagrams, because algorithm 1 preferentially matches off-diagonal points to each other, rather than to the diagonal. When the two diagrams have the same number of off-diagonal points, then the distance computed by algorithm 1 is the 1-Wasserstein distance of the sets *excluding* the diagonals. This can be rather different than the 1-Wasserstein distance of the persistence diagrams, since distances can “short-cut” through the diagonal. Our results in Sec. IV include both the 1-Wasserstein distance and algorithm 1.

## IV. RESULTS AND ANALYSIS

We tested the performance of the proposed topological processing approach on the ONR PONDEX10 sonar dataset,^{86,87} which consisted of synthetic aperture sonar echos of a variety of underwater targets. The general setup consisted of a long pond with a sandy bottom and a linear sonar sensor that moved along a rail transverse to its look direction. Each target was viewed in isolation and was proud on the sandy bottom at a range of 10 m. Each collection subtended about 60° of aspect angle with respect to each target. Each target was viewed at 0°, 20°, 40°, 60°, and 80° angles with respect to boresight, so there was typically ample (but not complete) angular overlap between collections. The data were provided as normalized, centered data files in which the pulse taken closest to the target was the first one listed in the file. The data files were supplied as a matrix, in which each row corresponded to a pulse and each column corresponded to a range sample. Each file consists of roughly 800 pulses and 1500 range samples.

The PONDEX10 dataset provides 11 different *target types* of four different *target groups*.

**Target group A:**contains three target types, TARGETA1–TARGETA3,**Target group B:**contains four target types, TARGETB1–TARGETB4,**Target group C:**contains two target types, TARGETC1 and TARGETC2, and**Target group D:**contains two target types, TARGETD1 and TARGETD2.

We expect that under the metrics described in Sec. III, target signatures should be tightly clustered (small internal distances) within a target type, more loosely clustered within target groups, and that there ought to be definite separation between target groups. To test this hypothesis, we computed the following.

Spectral and persistence diagram signatures for all targets, without using knowledge of the target identities.

Target-to-target distance matrices (Sec. IV A) for all targets, again computed without knowledge of target identities.

Ratios of distances within and between target types and groups and the Dunn index (Sec. IV B) to quantify how well the clusters are formed.

Mean numbers of target types and groups per cluster in average linkage hierarchical clustering for each matrix (Sec. IV C).

Figure 9 shows the persistence diagrams for a typical target. Observe that the spectral features consist of roughly 800 × 1500 pixel matrices, while the persistence diagrams consist of lists of roughly 200–300 points in two dimensions.

### A. Target-to-target distance matrices

Once the persistence diagrams and spectral signatures were computed for all targets, we computed the target-to-target distance matrices, exhibited in Figs. 10–14. We computed distance between persistence diagrams using algorithm 1 and the 1-Wasserstein metric (using the Hera implementation proposed in Ref. 85). For the spectra (Fig. 10), we used the pseudometrics described in Sec. III A, which essentially treats each spectral image as a large vector. We note that Ref. 12 [Fig. 5(a)] recovers the same *L*^{2} distance matrix exhibited in Fig. 10(a) (with the exception that our plots show only the 10 m range). However, Ref. 12 [Fig. 5(b)] shows the same matrix after a slow-time warping was performed. We also used Ref. 12 [Fig. 5(b)] in Sec. IV B, and found that this warping improves the within-target (but not within-group) distances somewhat.

Since algorithm 1 runs linearly in the number of points in the larger diagram, we control the runtime by specifying the number of random bijections we construct. After trial and error, we set the number of trials to 25 because differences between the distances became marginally smaller after this point (see Fig. 8), but the runtime increased significantly. We computed target-to-target distance matrices for both the *L*^{2} pulse-to-pulse metric (Figs. 11 and 12) and pulse-to-pulse correlation pseudometric (Figs. 13 and 14).

### B. In/out-group distance analysis

We expect that

Target-to-target distances between echos from the same target type (but different look angles) should be smallest,

Target-to-target distances between echos from targets within the same target group should be larger, and

Distances between echos from targets of distinct groups should be largest.

To quantify this experimental hypothesis for each classifier method, we computed two summary statistics for each of several target-to-target distances, using the true target types or groups to score the results.

The first summary statistic we computed was the well-known Dunn index.^{78} We selected the Dunn index as opposed to many other popular cluster performance measures (for instance, those described in Refs. 68 and 70–76) largely because it does not require the computation of centroids or means of persistence diagrams. The other methods require computing cluster means and therefore seem likely to add additional uncertainty into our analysis. We computed the overall Dunn index as well as a separate index for each target type or group using each of the target-to-target distances described in Sec. IV A. In the latter case, we scored the results as if there were only two classes: the target type or group in question versus all others. While the overall Dunn index gives an indication of the difficulty of separating any two given target types, the single target Dunn index gives a relative difficulty of separating that target from the others. The results of this analysis are shown in Table I. Each row of the table represents a target type or target group, while each column represents a target-to-target distance metric. The columns labeled “usual” refer to computations done using the 1-Wasserstein metric on persistence diagrams.

Unfortunately, the Dunn index is extremely sensitive to outliers because it involves computing minima of distances. Because of this, most of the cluster separation measures described in the literature (see Refs. 74–76 for various comparisons of cluster separation measures and Ref. 70 in particular) were developed to mitigate the impact of outliers. All of these improved measures compare cluster centers instead of the individual points in a given pair of clusters. Although we could compute a center for each cluster using the notion of a mean persistence diagram,^{77} this seemed to us to add considerable uncertainty into the process, especially given our concerns about topological variability of the data. Therefore, we computed a second (apparently novel) cluster separation measure that was more robust to outliers and did not require compute cluster centers. This cluster separation measure is the median of all ratios of the form

where *d* is the target-to-target distance provided by the classifier (using either spectra or persistence distance and a given pulse distance), *x* is a given target, *y* is another target with the same target type or class, and *z* is a target not in that target type or class.

Assuming the target types or groups are correct, larger values of the median ratio indicate a classifier with better performance. We computed this median ratio on each of the target types individually and on target groups for each of several target-to-target distance metrics. The results are shown in Table II.

. | . | . | . | Alg. 1 . | Alg. 1 . | Alg. 1 . | Alg. 1 . | Usual . | Usual . | Usual . | Usual . |
---|---|---|---|---|---|---|---|---|---|---|---|

. | Spectral . | Spectral . | Tucker and Srivastava . | H_{0}
. | H_{1}
. | H_{0}
. | H_{1}
. | H_{0}
. | H_{1}
. | H_{0}
. | H_{1}
. |

Group . | L^{2}
. | corr. . | [Ref. 12, Fig. 5(b)] . | L^{2}
. | L^{2}
. | corr. . | corr. . | L^{2}
. | L^{2}
. | corr. . | corr. . |

TARGETA1 | 1.929 | 1.402 | 1.589 | 42.747 | 5.833 | 64.850 | 5.671 | 6.432 | 2.760 | 3.356 | 2.583 |

TARGETA2 | 1.085 | 1.119 | 1.679 | 34.864 | 4.340 | 39.228 | 6.871 | 5.593 | 1.657 | 2.034 | 3.349 |

TARGETA3 | 1.670 | 1.624 | 1.717 | 173.878 | 16.451 | 29.491 | 4.972 | 8.652 | 3.740 | 2.631 | 2.410 |

TARGETB1 | 1.211 | 1.332 | 1.978 | 12.048 | 2.641 | 23.818 | 7.413 | 3.083 | 1.822 | 1.581 | 3.348 |

TARGETB2 | 1.146 | 1.138 | 2.118 | 30.023 | 2.715 | 54.616 | 4.885 | 4.502 | 1.490 | 3.710 | 2.816 |

TARGETB3 | 1.036 | 1.199 | 1.538 | 364.953 | 15.743 | 6.558 | 3.175 | 10.663 | 2.512 | 1.621 | 3.753 |

TARGETB4 | 1.117 | 1.331 | 1.596 | 271.188 | 12.327 | 17.815 | 4.636 | 10.947 | 2.637 | 1.923 | 2.548 |

TARGETC1 | 0.675 | 1.105 | 1.407 | 15.695 | 3.294 | 5.977 | 1.720 | 3.646 | 1.473 | 1.757 | 3.235 |

TARGETC2 | 0.749 | 0.886 | 1.833 | 26.762 | 6.836 | 3.203 | 2.024 | 3.719 | 2.324 | 1.093 | 2.852 |

TARGETD1 | 1.629 | 1.647 | 3.00 | 46.649 | 7.170 | 45.056 | 6.128 | 14.545 | 5.019 | 2.160 | 2.820 |

TARGETD2 | 1.198 | 1.242 | 3.355 | 38.331 | 4.916 | 82.953 | 8.490 | 4.542 | 4.298 | 3.279 | 3.483 |

Group A | 1.652 | 1.159 | 1.066 | 4.173 | 1.246 | 10.277 | 4.154 | 1.410 | 1.170 | 2.070 | 1.989 |

Group B | 0.997 | 1.111 | 1.077 | 2.502 | 2.646 | 3.099 | 2.231 | 0.965 | 0.939 | 1.144 | 1.499 |

Group C | 0.815 | 0.858 | 1.028 | 6.422 | 3.910 | 3.166 | 1.951 | 1.456 | 1.281 | 1.083 | 1.896 |

Group D | 1.286 | 1.216 | 1.455 | 20.076 | 6.170 | 28.635 | 6.223 | 2.121 | 2.740 | 1.844 | 2.052 |

. | . | . | . | Alg. 1 . | Alg. 1 . | Alg. 1 . | Alg. 1 . | Usual . | Usual . | Usual . | Usual . |
---|---|---|---|---|---|---|---|---|---|---|---|

. | Spectral . | Spectral . | Tucker and Srivastava . | H_{0}
. | H_{1}
. | H_{0}
. | H_{1}
. | H_{0}
. | H_{1}
. | H_{0}
. | H_{1}
. |

Group . | L^{2}
. | corr. . | [Ref. 12, Fig. 5(b)] . | L^{2}
. | L^{2}
. | corr. . | corr. . | L^{2}
. | L^{2}
. | corr. . | corr. . |

TARGETA1 | 1.929 | 1.402 | 1.589 | 42.747 | 5.833 | 64.850 | 5.671 | 6.432 | 2.760 | 3.356 | 2.583 |

TARGETA2 | 1.085 | 1.119 | 1.679 | 34.864 | 4.340 | 39.228 | 6.871 | 5.593 | 1.657 | 2.034 | 3.349 |

TARGETA3 | 1.670 | 1.624 | 1.717 | 173.878 | 16.451 | 29.491 | 4.972 | 8.652 | 3.740 | 2.631 | 2.410 |

TARGETB1 | 1.211 | 1.332 | 1.978 | 12.048 | 2.641 | 23.818 | 7.413 | 3.083 | 1.822 | 1.581 | 3.348 |

TARGETB2 | 1.146 | 1.138 | 2.118 | 30.023 | 2.715 | 54.616 | 4.885 | 4.502 | 1.490 | 3.710 | 2.816 |

TARGETB3 | 1.036 | 1.199 | 1.538 | 364.953 | 15.743 | 6.558 | 3.175 | 10.663 | 2.512 | 1.621 | 3.753 |

TARGETB4 | 1.117 | 1.331 | 1.596 | 271.188 | 12.327 | 17.815 | 4.636 | 10.947 | 2.637 | 1.923 | 2.548 |

TARGETC1 | 0.675 | 1.105 | 1.407 | 15.695 | 3.294 | 5.977 | 1.720 | 3.646 | 1.473 | 1.757 | 3.235 |

TARGETC2 | 0.749 | 0.886 | 1.833 | 26.762 | 6.836 | 3.203 | 2.024 | 3.719 | 2.324 | 1.093 | 2.852 |

TARGETD1 | 1.629 | 1.647 | 3.00 | 46.649 | 7.170 | 45.056 | 6.128 | 14.545 | 5.019 | 2.160 | 2.820 |

TARGETD2 | 1.198 | 1.242 | 3.355 | 38.331 | 4.916 | 82.953 | 8.490 | 4.542 | 4.298 | 3.279 | 3.483 |

Group A | 1.652 | 1.159 | 1.066 | 4.173 | 1.246 | 10.277 | 4.154 | 1.410 | 1.170 | 2.070 | 1.989 |

Group B | 0.997 | 1.111 | 1.077 | 2.502 | 2.646 | 3.099 | 2.231 | 0.965 | 0.939 | 1.144 | 1.499 |

Group C | 0.815 | 0.858 | 1.028 | 6.422 | 3.910 | 3.166 | 1.951 | 1.456 | 1.281 | 1.083 | 1.896 |

Group D | 1.286 | 1.216 | 1.455 | 20.076 | 6.170 | 28.635 | 6.223 | 2.121 | 2.740 | 1.844 | 2.052 |

### C. Hierarchical clustering analysis

To delve into a finer study of the clustering performance of each of the target-to-target distances, we performed average linkage hierarchical clustering.^{88–90} This produces a tree diagram, called a *dendrogram*, in which targets are included in a nested set of clusters. The results are shown in the left frames of Figs. 15–18. (For brevity, dendrograms associated to the usual 1-Wasserstein metric are omitted, but the summary statistics are shown in Table III.) Each individual target is written in the form “TARGETC1_40deg,” which means that the target type was “TARGETC1” and that the target was viewed from 40°. The horizontal axis in each dendrogram is the distance metric for clustering. Comparing the structure of the dendrogram to the known target types and groups allows the performance of that target-to-target distance to be quantified. For a good target-to-target distance, we expect that most intermediate clusters contain exactly one target type or one target group per cluster. As the clusters become large, they will necessarily contain many target types and eventually groups, but this ought to be minimized. The plots in the right frames of each of Figs. 15–18 are histograms showing the number of clusters with a given number of target types or groups. Our performance criterion is therefore the mean number of target types or groups per cluster, as shown in Table III.

Metric . | Target types . | Target groups . |
---|---|---|

Spectral L^{2} | 4.32 | 2.32 |

Spectral correlation (corr.) | 3.58 | 2.19 |

Tucker and Srivastava [Ref. 12, Fig. 5(b)] | 2.15 | 1.57 |

Alg. 1 H_{0} with L^{2} | 1.47 | 1.28 |

Alg. 1 H_{1} with L^{2} | 2.49 | 1.70 |

Alg. 1 H_{0} with corr. | 1.62 | 1.30 |

Alg. 1 H_{1} with corr. | 2.38 | 1.57 |

Usual H_{0} with L^{2} | 1.91 | 1.36 |

Usual H_{1} with L^{2} | 3.87 | 1.98 |

Usual H_{0} with corr. | 4.75 | 2.38 |

Usual H_{1} with corr. | 2.34 | 1.51 |

Metric . | Target types . | Target groups . |
---|---|---|

Spectral L^{2} | 4.32 | 2.32 |

Spectral correlation (corr.) | 3.58 | 2.19 |

Tucker and Srivastava [Ref. 12, Fig. 5(b)] | 2.15 | 1.57 |

Alg. 1 H_{0} with L^{2} | 1.47 | 1.28 |

Alg. 1 H_{1} with L^{2} | 2.49 | 1.70 |

Alg. 1 H_{0} with corr. | 1.62 | 1.30 |

Alg. 1 H_{1} with corr. | 2.38 | 1.57 |

Usual H_{0} with L^{2} | 1.91 | 1.36 |

Usual H_{1} with L^{2} | 3.87 | 1.98 |

Usual H_{0} with corr. | 4.75 | 2.38 |

Usual H_{1} with corr. | 2.34 | 1.51 |

## V. CONCLUSION

Using the visualizations provided in Ref. 12 (Fig. 6), which examines spectral features as studied in this article, it appears that the clusters associated to each target type (not group) are reasonably dense and well formed but are not well-separated from one another. This suggests that relying on spectral features leads to a classifier that is susceptible to overfitting. In a rather dramatic contrast, the persistence diagram features result in highly separated clusters. These intuitions are confirmed the hierarchical clustering analysis in Sec. IV C. We found that *H*_{0} appears to outperform the others, which means that the hierarchical clustering of the sonar pulses themselves appears to be sufficient for good clustering.

Because most of the entries in Table II are greater than 1, this confirms that spectral features and persistence diagrams both cluster targets based on their respective types and groups. This is confirmed by the increase in median ratios in Table II in the third column versus the first two demonstrates that the warping process proposed in Ref. 12 [Fig. 5(b)] improves classification performance. The Dunn indices of Tucker's warping process indicates that the clusters of similar targets are considerably more “pure” than those produced by the other methods, because the third column of Table I is generally larger than the others. However, this comes at a cost, because Tucker's warping places the clusters of targets of the different groups at about the same distance from each other. This reduces its effectiveness in the hierarchical clustering analysis as compared to persistence diagrams in Table III.

Algorithm 1 performs well against all the other methods when using the median ratios, since it has all of the largest values in Table II. It performs less well in terms of Dunn indices, as it has the largest values for 4 of the 11 target types and none of the target groups in Table I. This indicates that the clusters produced by the persistence diagrams have some overlap, but are well-separated on average. This is strongly confirmed by the hierarchical clustering results in Table III. A closer examination of the dendrograms in Figs. 17 and 18 show that clusters associated to different groups and types are well separated. Figure 16 shows that the clusters produced by Tucker's warping method are about the same “purity” at the level of target types, but abruptly merge into one large cluster with little distinction between target groups.

The usual 1-Wasserstein metric for persistence diagrams performs worse than our algorithm 1 in all cases except in three instances where its Dunn index is better (TARGETA3, TARGETD2, and the overall groups in Table I). This suggests that while topological features are important in classification, the size of the feature matters signficantly.

The dendrograms in Fig. 17 allow us to derive the following conclusions about the signatures associated to different target types.

TARGETD1 and TARGETD2 look similar to each other.

There are essentially two subgroups of groups A and B that look similar to each other. (Note that the spectral features and Tucker's warped metric do not distinguish these classes.)

TARGETA3, TARGETB1, TARGETB3, TARGETB4,

TARGETA1, TARGETA2, TARGETB2.

The prominent dark streaks in the

*H*_{1}distance matrix (Fig. 12) indicate that from some look angles,*H*_{1}is unable to identify certain targets. This is confirmed by a higher mean number of target types per cluster than for*H*_{0}.Both matrices exhibit low-distance blocks along the diagonals: targets look similar to themselves from different look angles.

In the case of the pulse-to-pulse correlation pseudometric (Fig. 14), we arrive at somewhat different conclusions.

TARGETB1 looks unlike all other targets,

*even*TARGETA3, which is its nearest neighbor.TARGETD1 and TARGETD2 look similar to each other, but are no longer confused with other targets except TARGETA2.

All types of groups A and B all look much more similar to each other than for the

*L*^{2}pulse-to-pulse pseudometric.The dark streaks are now gone from the

*H*_{1}diagram. This means that this pulse pseudometric works better at separating different targets.

From the dendrograms (Figs. 17 and 18), it is apparent that different targets in groups A and B can be confused at particular look angles. The signatures associated to group D are similar, though they can be confused with TARGETA1 and TARGETA2. TARGETC1 and TARGETC2 are somewhat similar to each other and to those in group B. Additionally, in Fig. 17(b), TARGETA2 at 60° is close to other looks of this target type and also many looks of TARGETA3 and TARGETB3.

### A. Future work

Because the persistence diagrams result in well-separated target clusters, the next natural step is to attempt to feed these features into a classifier algorithm. Since persistence diagrams lie in a metric space, some kind of nearest-neighbors with voting such as Refs. 91 and 92 would be a good choice. Since spectral features can be classified effectively using a support vector machine,^{93–95} it would be interesting to see if persistence diagrams could be used in a support vector machine. However, since persistence diagrams do not lie in a vector space, it would appear that this hope cannot be realized. However, recent work has yielded a number of possible persistence diagram vectorization approaches^{96–99} that may enable persistence diagrams to be used in a support vector machine if the metric is preserved.

Our distance for persistence diagrams (algorthm 1) is not the usual one and is not robust to small perturbations *of the distribution of received sonar echos*, as would be guaranteed by the robustness theorems^{49,77} that are well-known to topologists. Nevertheless, our distance is stable to perturbations of the pulses themselves and it performs better at separating different types of targets. We hypothesize the features of a sonar collection that govern classification are therefore robust to noise in the collection system.

Finally, we speculate that persistence diagrams do a good job of separating clusters because they highlight symmetries present in the data that are driven by physical processes. For instance, collating multiple pulses of a round target uncovers rotational symmetries in the pseudometric space of pulses. The use of topological invariants allows one to infer that the target is round by reasoning about its space of pulses.

## ACKNOWLEDGMENTS

The authors would like to thank the anonymous referees for their suggestions that have improved the paper considerably. This report is based upon research that was supported by, or in part by, the U.S. Office of Naval Research under Award No. N000141512090.