This Letter provides an advanced time-frequency analysis method for quickly and precisely estimating dispersion curves of low-frequency impulsive sounds propagated in shallow water. First, the modal chirplet transform is derived based on an approximated variation law of the modal instantaneous frequency, which can omit the modal filtering step of warping-based methods. Then an extension modal chirplet transform (EMCT) is developed to further improve its potential for automation, together with a trick for fast computation. EMCT is applied to a recorded signal to demonstrate that the method can achieve high time-frequency concentration for modal components in a significantly decreased run time.

## 1. Introduction

Dispersion is a very common physical phenomenon in underwater acoustics. In dispersive environments, a signal received at a hydrophone can be decomposed to several modal components. Since these components are closely related to the source/receiver configuration and the propagation medium, one can use them to infer details of the source location and the ocean environment through suitable inversion schemes, where the dispersion curve is probably the most commonly used characteristic of modal components.

Considering that dispersion curves are theoretically separated on the time-frequency (TF) plane, time-frequency analysis (TFA) is naturally suitable for dispersion curve estimation. However, traditional TFA methods, such as short-time Fourier transform (STFT) and wavelet transform (WT) (Chen *et al.*, 2003; Potty *et al.*, 2000), usually perform with thick instantaneous frequency (IF) trajectories and thus have insufficient resolution for modal components in short range measurements. There are two ways to distinguish each modal component in such case. The first way is to reduce the thickness of modal IF trajectories by parameterized TFA methods, such as matching pursuit algorithm (Chen *et al.*, 2003), Pekeris-STFT (Le Touzé *et al.*, 2009), and *β*-powergram (Bonnel *et al.*, 2013). However, these advanced TFA methods have found very limited applications since they were proposed. As pointed out in Bonnel *et al.* (2020), this may be caused by their complexity of understanding for the end-users. The second way is to conduct modal filtering through warping transform before applying traditional TFA methods, such as STFT (Bonnel *et al.*, 2020) or Wigner–Ville distribution (WVD) (Niu *et al.*, 2014). Unfortunately, manual operations are inevitable in modal filtering. In addition, since traditional TFA methods are still utilized, this scheme actually provides no improvement in resolution, i.e., the thickness of modal IF trajectories, which indicates the signal-to-noise ratio to some extent. Thus, this scheme may not be robust in estimating the IF trajectories of weak energy modes. In short, at present a relatively simple method with fewer manual operations (i.e., the potential for automation) is still desired in dispersion curve estimation.

The remainder of this Letter explores a new advanced TFA method of dispersion curve estimation. Since the proposed modal chirplet transform (MCT) has a similar framework as STFT, its programming can be easily realized based on the existing STFT code. This also makes MCT very competitive in computing efficiency. In addition, the modal filtering step can be omitted when applying MCT, which avoids a lot of manual operations. To reduce the number of manual parameters and further improve the potential for automation, an extension modal chirplet transform (EMCT) is developed, together with a trick for fast computation. The analysis of an experimental signal reveals that the proposed EMCT takes full account of both the TF concentration for modal components and the computational efficiency.

## 2. Definition of MCT and EMCT

### 2.1 The variation law of the modal IF

Details of the modal propagation and the warping transform can be found in Bonnel *et al.* (2020) and its references. The following descriptions assume that the reader has basic knowledge of them.

The variation law of the modal IF provides prior knowledge for MCT and EMCT to redistribute the signal energy and then to produce thin IF trajectories for modal components. Here, the ideal waveguide is utilized to derive an approximation of the variation law with a closed form. Since the variation law is also followed by the warping transform, the law's ability to describe the dispersion curves of experimental signals is clearly implied by the robustness of the warping transform in practical applications.

For an impulse signal that is transmitted in the ideal waveguide at time *t* = 0, the signal received at a hydrophone is given by

where $am(t)$ and $\Phi m(t)$ are the amplitude and phase of mode *m*, respectively.

Based on the range *r* and the speed of sound *c*, the modal phase $\Phi m(t)$ can be described as

where *f _{cm}* denotes the cutoff frequency of mode

*m*and $tr=r/c$ denotes the earliest arrival time of the impulse signal. Thus, the corresponding modal IF is

For passive applications such as marine mammal localization (Bonnel *et al.*, 2020), since the transmitting time is unknown, only the relative arrival time $tr0$ can be estimated from the received signal. As interpreted in Fig. 1, *t*_{0} represents the time difference between the absolute arrival time *t _{r}* and the relative arrival time $tr0$. In such case, the corresponding modal phase and modal IF are given by

### 2.2 The MCT method

Based on the nonlinear kernel function $fm(t)$ [Eq. (5)] or equivalently $\Phi m(t)$ [Eq. (4)], the definition of MCT is given by

with

In Eq. (6), $y(\tau )$ represents the received signal; $\Phi R(\tau )$ and $\Phi S(\tau ,t)$ are the rotation operator and the shift operator (Yang *et al.*, 2014), respectively; $w(\tau \u2212t)$ represents a window centered at time *t*; and $exp\u2009(\u2212j2\pi f\tau )$ is the basis function of the Fourier transform. Note that $\Phi R(\tau )\Phi S(\tau ,t)w(\tau \u2212t)$ can be regarded as a special window function that varies with the short overlapping windows. Apart from the variation of window function, the framework of MCT is the same as that of STFT.

As explained in Figs. 2(a) and 2(b), $\Phi R(\tau )$ and $\Phi S(\tau ,t)$ are responsible for the TF concentration around (*t*, *f*). Assume that the black curve in Fig. 2(a) is the target mode $y(\tau )=am(\tau )\u2009exp\u2009(j\Phi m(\tau ))$, i.e., mode *m*. The red block around (*t*, *f*) is the target zone whose TF concentration is waiting for improvement. Since $\Phi R(\tau )$ is equivalent to the conjugate of the phase of $y(\tau )$ in theory, taking $y\xd7\Phi R$ obtains the amplitude of $y(\tau )$, i.e., $am(\tau )$, whose $IF\u22610$. As a consequence, by the rotation operator $\Phi R$, the red line in Fig. 2(b) moves to the position of the blue line. Furthermore, taking $y\xd7\Phi R\xd7\Phi S$ obtains $am(\tau )\u2009exp\u2009(j2\pi fm(t)\tau )$, whose $IF\u2261fm(\tau )\u2261f$. Thus, by the shift operator $\Phi R$, the blue line in Fig. 2(b) moves to the position of the black line. As indicated by the red line and the black line, the energy distributed around *f* is all concentrated to a single frequency *f*, which means TF concentration at (*t*, *f*) is improved. In practical applications, since the kernel function $fm(t)$ is only an approximation of the real modal IF, the rotation explained above will not be performed perfectly. In such case, the TF concentration of modal components is still able to be improved (at least partially). For interference and noise, due to the strong mismatch to the kernel function $fm(t)$, the rotation will spread their TF concentration, i.e., to a certain extent, the rotation acts with an inhibitory effect.

The implementation of MCT needs the advance provision of three parameters for the kernel function $fm(t)$. The estimation of these parameters follows the strategies listed below:

$tr0$: The relative arrival time is a critical parameter for MCT. For the time-domain waveform, the amplitude increases quickly after $tr0$, while for the spectrogram (e.g., STFT), $tr0$ is closely followed by the time of arrival of high frequencies. According to these two characteristics, one can fix $tr0$.

*t*_{0}: The unknown time delay is weakly sensitive in applications. Thus, any reasonable and possible value in experiments can be taken as an estimate [e.g., as shown in Figs. 3(b) and 3(e), $t0=20$ s performs a good result, though the real value is about 3.2 s].*f*: When the first two parameters, $tr0$ and_{cm}*t*_{0}, are provided, the cutoff frequency can be estimated through the warping transform. In addition, it also can be determined by the equation $fm(t)=f$ [i.e., Eq. (5)], where (*t*,*f*) can be any point of the target mode in theory. However, points dotted around the middle position of the dispersion curve are more proper in practice. Since Eq. (5) can be regarded as the base of warping transform in essence, these two ways will give the same estimate for*f*._{cm}

### 2.3 The EMCT method

In MCT, since each *f _{cm}* is applied for only one modal component, a repetitive procedure is inevitable to extract all the dispersion curves. This inconvenience will evidently take a lot of time, especially when coping with large amounts of data. Therefore, EMCT is proposed to optimize the computation of MCT.

Apparently, when $tr0$ and *t*_{0} are provided in advance, the corresponding cutoff frequency *f _{cm}* at a certain point (

*t*,

*f*) on the TF plane can be uniquely determined from the equation $fm(t)=f$. Against that backdrop, EMCT is given by

Although Eq. (7) achieves a more automatic computation than MCT, it takes too much computational time as a cost. Note that adjacent points on the TF plane have almost the same *f _{cm}*. Thus, in practical applications,

*f*of the nearby points can be approximated by that of the center point to decrease the run time, i.e.,

_{cm}where the strategy of frequency-band partition is utilized to determine the range of the approximation. As illustrated in Fig. 2(c), *f _{c}* and Δ

_{f}in Eq. (8) represent the center frequency and the half width of each frequency band, respectively. It is apparent that frequencies in the same band at a certain time

*t*will share a common

*f*, which is determined by the equation $fm(t)=fc$. If the bandwidth is equal to the frequency range of Fourier transform, EMCT degenerates into MCT.

_{cm}In essence, EMCT is a splice of a certain number of MCT results that are computed with different *f _{cm}*. Through the splice, EMCT avoids the laborious step of manually estimating the

*f*for each modal component. By applying the strategy of frequency-band partition, EMCT achieves a remarkable acceleration in computation with almost no sacrifice in the TF concentration [as exhibited in Figs. 3(c) and 3(f)]. In addition, an interesting fact is that EMCT could be equivalent to the method of Chen

_{cm}*et al.*(2003) at $\Delta f=0$ [i.e., Eq. (7)], though they are proposed based on different schemes.

## 3. Experimental example

The aim of this section is to validate the performance of the two proposed methods. In this section, a detailed processing for estimating dispersion curves of a combustive sound source (CSS) signal has been documented. For convenience, the data as well as the computing programs used below (including MCT, EMCT, and other existing methods that are mentioned in Sec. 1) are all offered in the supplementary material.^{1}

A CSS signal that is recorded at a range of about 4.8 km is employed in the performance validation of the two proposed methods. Nominally, this CSS signal contains as many as 18 modes (modes 10–13 are missing due to their weak modal energy) and thus is ideally suitable for a performance test of dispersion curve estimation. This CSS signal is available online in the supplementary material of Bonnel *et al.* (2020). More experimental details about this CSS signal are described in Bonnel *et al.* (2018).

The CSS source is generally considered as impulsive, but in fact it is preceded by a weak precursor and followed by several bubble pulses. To avoid the influence of bubble pulses and show the spectrogram more clearly, here the source deconvolution step performed in Bonnel *et al.* (2020) is reproduced with code provided in the same reference. In addition, the sampling frequency is decreased to $fs=1953.2$ Hz to reduce unnecessary computational burdens. Figure 3(a) is the time series of this CSS signal after source deconvolution. In Fig. 3(a), the relative arrival time $tr0=0.055$ s is identified by a vertical dashed line where the amplitude increases suddenly. The corresponding STFT spectrogram is shown in Fig. 3(d). The spectrogram provides an overview of TF structures for the CSS signal and, in particular, an intuitive indication for estimating $tr0$. However, due to the insufficient resolution for modal components, dispersion curves are highly contaminated by unknown interference components. Therefore, high-order modes are impossible to identify without further effort.

Figure 3(b) shows the MCT spectrogram of mode 4 with $t0=20$ s (intentionally larger than the real value of about 3.2 s to illustrate the insensitivity of *t*_{0}) and the corresponding *f _{cm}* = 15 Hz. In Fig. 3(b), the shape of mode 4 is shown clearly. However, as an inherent deficiency of MCT, the shapes of other modes suffer great damage due to the mismatch between the kernel function $fm(t)$ and the modal IF. The EMCT spectrogram ($\Delta f=0$) of the CSS signal is depicted in Fig. 3(e), where all nominal 18 modes have been successfully extracted. [Modes 19 and 20 may exist pursuant to Fig. 3(e); however, more efforts are needed to confirm this. Proving their existence is beyond the scope of this Letter and hence will not be discussed here.] For validation purposes, dispersion curves that are predicted after an inversion (Bonnel

*et al.*, 2018) are superimposed as black dotted curves in Fig. 3(e). It is apparent that the EMCT results are almost identical to the predicted curves.

The influence of the half bandwidth Δ_{f} upon the run time of EMCT is shown in Fig. 3(c), where each value is an average of 20 repetitions. In Fig. 3(c), the *y* axis is normalized by the longest run time ($\Delta f=0$) of 11.88 s. In addition, the *x* axis is scaled by the frequency resolution of the discrete Fourier transform ($fs/2048\u22480.95$ Hz) for a convenient description in the discrete frequency domain. Obviously, by increasing the half bandwidth, the computational efficiency is remarkably improved. For example, when $\Delta f=15$, the run time is reduced to only 0.44 s (about 1/27 of that of unaccelerated EMCT). The corresponding spectrogram is exhibited in Fig. 3(f), where the dispersion curves are distinguished as clearly as those in Fig. 3(e). Figures 3(c) and 3(f) demonstrate that the frequency-band partition is an available and efficient strategy to improve the computational efficiency without sacrificing too much TF concentration.

Figures 3(g) and 3(h) are spectrograms of Pekeris-STFT and *β*-powergram, respectively. Clearly, Pekeris-STFT makes a huge improvement to TF concentration, and modal components are separated completely in Fig. 3(g). The *β*-powergram also reveals almost all nominal modal components; however, it gives a lower resolution compared with Pekeris-STFT or EMCT. It should be noted that, due to the difference in computational framework, the strategy of frequency-band partition is unsuitable for Pekeris-STFT or *β*-powergram. This makes EMCT the first approach that considers both TF concentration and computational efficiency.

Figure 3(i) is a WVD spectrogram of mode 4. Although some inner-interference terms exist, the shape of mode 4 is clearly revealed. Nevertheless, the good performance of WVD depends on a well-done modal separation, which is a manual procedure. As shown in Fig. 3(b), the application of MCT can avoid the modal filtering step. Therefore, the proposed MCT is more convenient than WVD in dispersion curve estimation.

## 4. Conclusion

This Letter focuses on the estimation of dispersion curves of low-frequency impulsive sounds propagated in shallow water. Based on an approximated variation law of the modal IF derived from the ideal waveguide, an advanced TFA method named MCT and its extended version EMCT are developed to quickly and accurately extract dispersion curves without requiring prior information about the source location and the propagation environments. Processing of a special multi-mode signal reveals that the proposed EMCT is very effective in distinguishing modal components and can be made very efficient in computation by applying the strategy of frequency-band partition. Therefore, the proposed EMCT becomes the first approach that considers both TF concentration and computational efficiency. Abundant supplementary material that contain all the methods mentioned in the Introduction has been packaged to facilitate the estimation of dispersion curves for related research and applications.

## Acknowledgments

This research was supported by National Natural Science Foundation of China Grant Nos. 11974286 and 11904290, Fundamental Research Funds for the Central Universities Grant No. 3102019HHZY030019, and the State Key Laboratory of Acoustics, Chinese Academy of Sciences (No. SKLA201906).

^{1}

See supplementary material at https://www.scitation.org/doi/suppl/10.1121/10.0003367 for (a) the CSS signal obtained after source deconvolution and its dispersion curves predicted after an inversion in Bonnel *et al.* (2018), (b) the attached TFA programs (including MCT, EMCT, *β*-powergram, Pekeris-STFT, warping transform, inverse warping transform, WVD, WT, STFT, and inverse STFT), and (c) the example about the use of TFA programs.