An acoustic method for simultaneous condition detection, localization, and classification in airfilled pipes is proposed. The contribution of this work is threefold: (1) a microphone array is used to extend the usable acoustic frequency range to estimate the reflection coefficient from blockages and lateral connections; (2) a robust regularization method of sparse representation based on a wavelet basis function is adapted to reduce the background noise in acoustical data; and (3) the wavelet components are used to localize and classify the condition of the pipe. The microphone array and sparse representation method enhance the acoustical signal reflected from blockages and lateral connections and suppress unwanted higherorder modes. Based on the sparse representation results, higherlevel wavelet functions representing the impulse response are used to localize the position of the sensor corresponding to a blockage or lateral connection with higher spatial resolution. It is shown that the wavelet components can be used to train and to test a support vector machine (SVM) classifier for the condition identification more accurately than with a time domain SVM classifier. This work paves the way for the development of simultaneous condition classification and localization methods to be deployed on autonomous robots working in buried pipes.
I. INTRODUCTION
Buried pipe infrastructure is important to urban life and forms a vital part of many engineering structures for transporting fluids and gases. In the United Kingdom alone, there are over 600 000 km of sewer pipes.^{1} The US Environmental Protection Agency estimates that water collection systems in the United States have a total replacement value between $1 and $2 trillion. In Europe, buried water pipe networks are much longer and have a much higher replacement value. These networks are aging rapidly and becoming more heavily used due to population growth, increasing demand for water, and climate change, which leads to an increased rate and severity of faults in these pipes. Therefore, reliable techniques for condition monitoring and fault detection are required for the inspection and targeted maintenance of pipe infrastructure.
Autonomous robotic sensing systems working in buried pipes for condition monitoring and fault detection offer the opportunity to capitalise on recent advances in acoustic and ultrasonic sensing techniques.^{1} Acoustic methods have been investigated for blockage detection and condition assessment in sewage pipes in recent decades.^{2} These methods are a very attractive alternative to traditional visual closedcircuit television (CCTV) inspection methods because they are rapid and highly efficient computationally. Acoustically reflective artefacts, including blockages, can be localized remotely with respect to the robot position using the time delay of acoustic echoes measured with a microphone.^{3} In sewer pipes, the power reflection ratio and signal phase measured with the microphone can be used to discriminate between various inpipe conditions, e.g., blockage, lateral connection, or pipe end.^{4}
Although acoustic methods are well suited for use on an autonomous robotic platform, they are complicated by the multimodal sound wave propagation in a partially filled sewer pipe.^{5} As a result, in this class of applications, it is common to limit the frequency range to the socalled plane wave regime only, i.e., to the range below the first eigenfrequency of the round sewer pipe, $ f 10=0.59c/2R,$^{6} where $c$ is the sound speed in air and $R$ is the radius of the pipe. In the case of a typical 300 mm sewer pipe, this frequency is 669 Hz for $c$ = 340 m/s. To radiate a sufficient acoustic power in this frequency range, a large powerful speaker is generally required, which is difficult to deploy on a small robot that would operate in a typical sewer. Furthermore, such a lowfrequency range limits the condition localization and classification accuracy due to a relatively long wavelength and restricted frequency band. The main contribution of this paper is to overcome this limitation by proposing a new microphone array processing and machine learning method that extends the frequency range well above the first eigenfrequency to achieve much higher spatial resolution for condition detection and classification in sewer pipes.
A microphone array consists of a set of microphones positioned in a specific way to capture the spatial information about the sound field that can be used for various purposes, e.g., spatial filtering, noise reduction, and dereverberation problems for audio processing.^{7} This paper uses the microphone array located at the same cross section to capture the acoustic signal containing the spatial information about the first four modes. This information is then processed to extract the fundamental mode (plane wave) to enable defect localization and classification. This method makes use of wavelets, which are well suited to reconstruct a transient signal in the presence of background noise. The idea of using wavelets to deal with transient signals is motivated by prior work, such as that of Ferrante et al.,^{8} who used the wavelet transform to analyze the transient pressure signal for leakage detection, and Owowo and Oyadiji,^{9} who used wavelets and the soft threshold method to cancel background noise from acoustic signals for leakage detection in an airfilled pipe. However, little or no work has been performed on the use of wavelets to identify and localize conditions in sewer pipes. This paper proposes a sparse representation method that uses a wavelet basis to cancel the background noise, improve the resolution for condition localization, and increase the accuracy of classification between blockages and lateral connections through postprocessing of enhanced acoustical data with support vector machine (SVM).
A health monitoring system should seek to answer a number of key questions, including the presence of a fault (i.e., a blockage or lateral connection) and the location of a fault,^{10} that are needed to target repair or cleanup operation. An advantage of the microphone sensing array we propose here is that it has a dual use for both condition detection and localization. The method we propose for localization uses the Kalman filter, which operates on acoustic features extracted from the signals measured by the microphone array. This makes the approach highly efficient as a single sensing method is used for both tasks.
The structure of this paper is organised as follows. Section II discusses the theory of acoustic wave propagation in a cylindrical pipe and signal processing theory, including wavelets, sparse representation, and SVM. Section III presents the simulation results of the microphone array processing and acoustic reflection from blockages and lateral connections in the pipe. The experimental setup is discussed in Sec. IV. Experimental results for blockage localization and identification are discussed in Sec. V.
II. THEORY
A. Acoustic waves in a cylindrical pipe
Equation (4) predicts the wavenumber for different modes at different frequencies, which means that the sound velocity in each mode (except in the case of plane wave when $ k 00=0)$ is frequency and modedependent. When the free field wavenumber k_{0} is larger than the eigennumber k_{mn}, or the frequency is above the corresponding eigenfrequency f_{mn}, the particular acoustic mode can propagate along the pipe with relatively little attenuation. Figure 2 shows schematically the angular and radial dependence of the first four mode shapes in the cylindrical pipe. In this figure, the plus or minus corresponds to the sign taken by the modal shape ( $ \Psi m n$) in Eq. (1).
B. Sparse representation using wavelets
Acoustic wave in the pipe can be complicated with multiple modes that travel at different velocities. Using the plane wave mode reconstruction method introduced in Sec. II A [i.e., Eq. (9)] a single mode can be extracted across a broad frequency range and used for inpipe condition detection, localization, and classification. This paper proposes a sparse representation with wavelets for the simplified impulse response to clean up the higher mode residue and to cancel background noise. Different from the wavelet decomposition and soft shrinkage for noise cancelation proposed in Ref. 6, this paper uses sparse wavelet representation to cancel background noise and to clear up some higher mode residue after the plane wave reconstruction with Eq. (9).
There are two main reasons for using the sparse representation. First, it is possible to assume that the acoustic echoes from the pipe artefacts (e.g., blockages/junctions) have relatively short duration time. This means that the impulse response actively measured on the robot is considered to contain a large number of zero components apart from the initial pulse and reflected echo wave packs, which leads to the sufficient sparsity relative to its dimension in time domain. The sparse nature of the impulse responses has been illustrated in Ref. 3 as an example and is explained further in our paper. Second, the impulse response can be written in terms of appropriate basis vectors where only a few vectors are active, hence, reducing the number of the time domain signals required to store for an accurate signal representation in accordance with the Nyquist sampling theorem.^{12} The basis vectors used in this paper are wavelet functions.
Different levels of wavelets have different frequency components. Wavelets with higher frequency components can provide a higher spatial resolution to the problem of condition and/or robot localization. For example, the fourth order Symlets function, sym4, has five different levels, where s_{1} corresponds to the lower frequency and s_{5} to the higher frequency components. Therefore, it is convenient to use the higher level of wavelet domain vector $ s \u0302$ to predict the location of the robot with higher precision. Furthermore, wavelet domain vector $ s \u0302$ can also be used to fit into the SVM trainer to identify the blockage from junctions.
C. Robot localization
After the sparse representation of the impulse response, the higherlevel wavelet components can be used to localize the robot position along the pipe. Robot localization is the means by which a robot estimates its position with respect to the surrounding environment. Localization is required for robot control and autonomous navigation, reporting the location of conditions detected in a pipe network, and mapping unknown parts of the pipe network. Normally, information from sensing of the robot's motion and surroundings is input into a localization function. In typical robotic applications, vision and rangefinding sensors such as scanning lidar are popular means of making perceptions, as they are able to acquire a large amount of information from the arbitrary environment. In the pipe environment, however, these sensors are limited in scope and only able to observe nearby artefacts due to the confined space within the cross section that is very limited compared to a relatively long length of the pipe and scale of the overall pipe network. This limit in scope means that a localization estimate will accumulate uncertainty over time and that the estimate will likely drift from the true robot position. Acoustic echo sensing is able to perceive more distant artefacts in the pipe environment and offers a means of perception that will not cause an accumulation of uncertainty. In a previous study,^{3} the robot localization has been validated with a speaker and single microphone sensing system using the plane wave below the cutoff frequency. This paper uses the microphone array to extend the frequency range of the signal to localize the robot and artefacts more precisely.
Robot localization typically takes a probabilistic approach,^{15} where the uncertainty in each measurement is acknowledged, and a localization estimate is the most likely value of the robot's state in the probability distribution computed over all possible states. Many robot localization approaches are derived from a Bayes filter, a mathematical tool that facilitates the incorporation of prior knowledge and measurements to produce a posterior estimate. A practical implementation of this is the Kalman filter described below.
This estimation process can be improved through improved acoustic sensing and processing described in this paper. When high level wavelet components are used, as described in Sec. II B, the measurement uncertainty, Σ_{ξ,t}, and subsequent estimate uncertainty, Σ_{t}, will be reduced, and the likelihood of correct data association will be higher, improving robustness. If classification of each artefact in the environment is possible, as described in Sec. II D, it can be incorporated into this estimation of data association. Again, this can improve the robustness of the estimation process.
D. SVM classifier

Acquisition of impulse response: Speaker sent a chirp signal and simultaneously recorded the response using six microphones. After deconvolution and bandpass filtering (200–3000 Hz), the sixchannel impulse response was obtained, x^{m}, where m = 1:6.

Plane wave construction: Averaging the sixchannel impulse response provided the preprocessed plane wave impulse response x.

Denoising and feature extraction using wavelets and sparse representation: After generating the wavelet matrix W using sysm4 level5 wavelets, plane wave impulse response x was constructed using sparse representation algorithm (SpaRSA) to obtain the denoised signal $ x \u0302=W s \u0302$ and wavelet components $ s \u0302= s \u0302 1 \u2009 s \u0302 2 \u2009 s \u0302 3 \u2009 s \u0302 4 \u2009 s \u0302 5 T$.

Localization: The higherlevel wavelet components $ s \u0302 3 \u2009 s \u0302 4 \u2009 s \u0302 5 T$ were used to represent the higher frequency signal $ x \u0302 h=W s \u0302 3 \u2009 s \u0302 4 \u2009 s \u0302 5 T$ and to apply the Hilbert transform to $ x \u0302 h$, where the coordinates of the peaks of the envelope were associated with the location of pipe artefacts referred to the robot. For sequential robotic localization, the measured coordinates were then imported to the $ \xi t$ for Kalman filter to predict the locations.

Classification: The wavelet components ( $ s \u0302$) associated with the artefacts as the input (X_{m}) and the label Y_{m} (blockage uses 1 and nonblockage uses −1) were used for SVM training and testing.
Figure 4 shows the processing steps for denoising, localization, and classification.
III. SIMULATIONS
This section discusses the analytical and numerical simulations of the microphone array processing used to extract the plane wave from the overall acoustic pressure [Eq. (1)] measured on the microphone array and to estimate the reflection coefficient for an artefact in the pipe. The sensor placement and position uncertainties due to the robot movement in the pipe are discussed in Sec. III A. This can provide evidence in support of the adopted sensor placement strategy for the plane wave reconstruction. The reflection coefficient from blockage and lateral connections will be obtained from numerical simulations to validate the plane wave reconstruction method proposed in the paper via comparison with the experimental results in Sec. V.
A. Microphone array for plane wave reconstruction
A numerical simulation was implemented based on the transfer function described by Eq. (5). The excitation point source was located close to the pipe wall so that all the acoustic modes were excited. The setup used in this numerical simulation is shown in Fig. 5(a). The six virtual microphones were positioned circumferentially and equidistantly spaced at 0.628 R. At this radial position, $ J 0 k 01 r=0$, so that the amplitude of the first axisymmetric mode is equal to zero as illustrated in Fig. 2. However, the microphones may not be at the ideal positions in a practical situation, e.g., when the robot platform cannot be perfectly located. Therefore, a simulation using slightly shifted microphones (the centres of the six microphones were shifted at a distance of 0.02 R) was also implemented [see Fig. 5(b)].
As shown in Fig. 5(c), averaging the acoustic sound pressures predicted for the sixmicrophone array removes the higher modes over the frequency range of 0–5 kHz (within 1 dB fluctuation for plane wave mode) if the microphones were ideally positioned circumferentially at 0.628 R. When the sensors shifted slightly at the distance of 0.02 R, the first four modes were cancelled significantly over the frequency range 0–3.7 kHz (within 1.5 dB fluctuation) apart from the first eigenfrequency around 1.3 kHz. At higher frequencies, the spatial information collected by six microphones tends to be more sensitive to the shifted distance, resulting in a more significant error in the plane wave reconstruction. In this case, the fluctuation in the mode (4, 0) and mode (5, 0) is larger than 3 dB as shown in Fig. 5. Furthermore, it is also observed from additional simulations that the plane wave reconstruction error tends to increase with the microphone shift distance, although the dependence of the error on the microphone shift distance it is not discussed in detail in this paper.
B. Reflection coefficient
In this paper, the acoustic wave reflection from a blockage or lateral junction was studied using the FEM available in commercial software comsol. Figure 6 shows the simulation setup for sound propagation in the presence of a blockage and lateral connection. The pipe diameter was 0.15 m, which is consistent with that used in the experiments. The height of the blockage was set as 0.6 times the pipe diameter, i.e., h/R = 1.2, and it was a diameter long [see Fig. 5(a)]. The maximum mesh size in this numerical study was below 9.5 mm, which corresponded to 1/12 of the acoustic wavelength at 3 kHz. Plane wave excitation was used in the simulations. A perfectly matched layer (PML) was set up at the ends of the pipe to absorb sound to simulate an infinite pipe length. The surface of the blockage in this study was assumed solid, i.e., its acoustic characteristic impedance was much larger than that of air ( $ Z blockage\u226b Z air$). The pipe wall was also assumed as rigid.
As discussed in Sec. II A, the sound pressure in the plane wave mode was predicted by Eq. (9) using twodimensional (2D) integration over the cross section. The incident and reflected plane wave interfere with each other, resulting in the fluctuation of sound amplitude in z axis. Using the peak and trough values of the fluctuating acoustic pressure at different axial coordinates, the reflection coefficient can be estimated from Eq. (8). The 2D integration over the cross section was implemented repeatedly with 0.005 m intervals and over 1 m range.
The sound pressure in the plane wave mode, P_{00}, for three frequencies obtained through the simulation and integration in Eq. (9) is shown in Fig. 7. These frequencies were chosen to be between the eigenfrequencies and to illustrate the dependence of the sound pressure as a function of the axial direction. Note that the curves shown in Fig. 7 can be used to determine the amplitude and phase of the complex reflection coefficient for the plane wave mode at these particular frequencies, although only the amplitude of the reflection coefficient is discussed in this paper [see Eqs. (6)–(9)].
It is also worth noting that, although the incident wave was a plane wave, the reflection contains higher modes due to the wave scattering at the artefacts (see Fig. 6). There is also a complex relation between the mode number and modal excitation coefficients, depending on the nature of an artefact. Using the integral from Eq. (9), the higherorder modes can then be cancelled as discussed in Sec. III A. Therefore, the amplitude of P_{00} is a combination of the direct and reflected plane waves. It is frequencydependent because of the complexity of Eq. (5) and integral (9). The modal pressure oscillates as a function of z with the period determined by the wavelength. As shown in Fig. 7, the amplitude of the plane wave mode at 1 kHz (below the first eigenfrequency for the 0.15 m diameter pipe, f < f_{10}) over the axial direction is significantly higher than the amplitude of the 2 kHz wave (between the first and second eigenfrequency: f_{10} < f < f_{20}). This can be understood intuitively as a part of the plane wave excitation energy being converted into the higher modes after scattering from the blockage.
IV. EXPERIMENTAL SETUP
A. Robotic platform
The acoustic sensing system used in this work consisted of a loudspeaker, sixmicrophone array, and processor [including power amplifier for loudspeaker, analogtodigital converter (ADC), digitaltoanalog converter (DAC), and Raspberry Pi 4 (Raspberry Pi, Cambridge, UK) for data acquisition] as shown in Fig. 8. This system was installed on a remotely controlled robot [iRobot Looj 330 by iRobot (Bedford, MA)]. The sampling rate was 16 kHz. A bandpass filter with the frequency response of 200–3000 Hz was used to reduce noise. A 100–4000 Hz sweep sine with 10 s duration was used as the excitation signal. The speaker and microphone array were located at the centre of the pipe within 5 mm positional error initially, although this could change due to the robot movement inside the pipe. The radial coordinates of the microphone array were around 60 mm from the pipe centre [see also Fig. 5(a)]. The microphone type used in this test was MSM321A3729H9CP by MEMSensing Microsystems Co., Ltd. (Suzhou, China), and the speaker (Visaton 2242) size was 32 mm diameter driven with a 3 W power supply.
B. Pipe network
In this work, different sizes of blockages were used in a 150 mm diameter polyvinyl chloride (PVC) pipe laid in the iCAIR laboratory at the University of Sheffield. These blockages are described by the ratio of the height of the blockage to the pipe radius $h/R$ = 0, 0.4, 0.8, 1.2, 1.6, and 2, as shown in Fig. 9(a). Figure 9(b) shows an impression of a sandbag blockage in the 150 mm pipe. Other kinds of blockages were also used in the experiment, e.g., acoustic absorbent foam and plastic block, as shown in Figs. 9(c) and 9(d), respectively. To simulate a full 100% blockage, a heavy wooden board was put at the end of the pipe. Efforts were made to seal the circumferential gap between the pipe and board. The straight pipe was constructed from several pipe sections connected with joints at different angles as illustrated in Figs. 9(e) and 9(f). The pipes were not perfectly joined, and joints were not perfectly sealed, so some energy in the acoustic wave was able to reflect and leak out due to the discontinuity at a joint.
V. RESULTS
A. Reconstruction of plane wave and denoising using sparse representation
The impulse response measured using the acoustic system in the pipe with a blockage and lateral connection is shown in Figs. 10 and 11, respectively. Note that time domain impulse response was converted into the distance domain response by multiplying the time by sound velocity (343 m/s). For a single microphone, the wave dispersion into the higherorder modes significantly complicates the impulse response, causing high frequency noise in the data. This noise can cause difficulties in identifying the condition, particularly when the condition is a small blockage, e.g., $h/R$ = 0.2, as shown in Fig. 10. Averaging the sixmicrophone data removes the higherorder modes and provides a cleaner signal than that obtained on a single microphone. This is consistent with the theoretical study presented in Sec. II A. Sparse representation cleans up the data further, making it more convenient to apply the localization and classification algorithms detailed in Sec. II. This is even more evident in the case of the data obtained for the pipe with a lateral connection as illustrated in Fig. 11.
B. Reflection coefficient from blockages and lateral connections
As discussed in Sec. II A, the amplitude of the reflection coefficient from a blockage and lateral connection for the plane wave mode can be estimated from Eq. (8). The simulation results for the reflection coefficient were obtained based on the FE modeling in comsol (see Sec. III). For experimental measurements, the impulse response from the blockage and lateral connection (e.g., see the first echo pulse in Figs. 10 and 11, respectively) were extracted using time windowing, zeropadded, and transferred into the frequency domain. A comparison between the simulation results and measurements of the reflection coefficient spectra from blockage and lateral connection with different setups is shown in Figs. 12–14. The measured data were obtained from the denoised plane wave echo using the sparse representation algorithm described in Sec. V A.
Figure 12 presents the reflection coefficient from blockages with different sizes [ $0.2\u2264h/2R\u22640.8$; see Fig. 9(a)] obtained from the simulation and experiments. The full blockage was used as a reference, where all the other reflection coefficient were normalized by the full blockage echo obtained experimentally. The simulated results had less than 0.06 average discrepancy with the measurement over the frequency range 200–3000 Hz. As shown in Fig. 12, the acoustic reflection becomes stronger when the blockage size increases. This information can be used to estimate the size of the blockage. Furthermore, the acoustic reflection coefficient becomes larger when the frequency approaches the first cutoff frequency. These properties of the frequency domain spectra can be used to classify blockages from other artefacts such as junctions.
Figure 13 shows the predicted and measured acoustic reflection coefficient spectra from the lateral connection of different diameters attached to the main pipe perpendicularly. The 150 mm diameter branch lateral connection results in a higher reflection coefficient than the 100 mm diameter branch. The lateral connection works like a highpass filter that allows for the propagation of higher frequency sound wave. The reflection coefficient from a lateral connection drops significantly as the frequency of sound approaches the first cutoff frequency. This highlights the importance of extending the frequency range in the proposed analysis to enable the localization and classification of conditions beyond a lateral connection.
Figure 14 shows the predicted and measured acoustic reflection coefficient from the lateral connection installed at different angles. An increase in the angle of the lateral connection results reduces the sound pressure in the reflected plane wave mode over the frequency range 200–3000 Hz. For a lateral connection, the reflection coefficient reduces significantly as the frequency of sound approaches the first cutoff frequency. This slope in the frequencydependent reflection coefficient becomes steeper when the lateral angle gets smaller. At higher frequencies beyond the first cutoff frequency, the reflection coefficient for the lateral connection with an increased angle tends to develop a local peak, beyond which it decreases gradually. For example, in the case of a 100 mm lateral connected at 90°, the measured reflection coefficient increases until reaching a peak around 2 kHz and then decreases gradually to almost zero at 2.7 kHz (see Fig. 13).
This section provides the knowledgebase for further identification of blockages and lateral connections according to their acoustic reflection properties. The close agreement (less than 0.06 error on average) between the simulated and measured reflection coefficient for blockages and lateral connections in the frequency domain (300–3000 Hz) also validates the signal processing methods of sparse representation. The denoised blockage/lateral signal is useful for the localization and classification algorithms.
C. Robotic localization for blockage/lateral connections
After the reconstruction of plane wave mode using multiple microphone processing and sparse representation method, a robot can process the denoised signal to localize its position with respect to a blockage or lateral connection. In the previous work,^{3} the Hilbert transform was used to obtain the envelope of the time domain impulse response, where the coordinates of the peaks of the envelope correspond to the relative distance between the pipe artefacts and robot. In this paper, high frequency components of the plane wave impulse response were used for a more precise acoustic localization achieved with microphone array processing. This was accomplished by using the envelope of the higherlevel wavelet representation of the impulse response.
As shown in Fig. 15(a), the original plane wave impulse response x after the averaging from sixmicrophone data can be sparse represented and denoised using wavelets to obtain a clearer signal $ x \u0302$ for postprocessing. Wavelet components ( $ s \u0302= s \u0302 1 \u2009 s \u0302 2 \u2009 s \u0302 3 \u2009 s \u0302 4 \u2009 s \u0302 5 T$) and their representation ( $ x \u0302=W s \u0302$) for the impulse response are shown in Fig. 15(b). Highestlevel wavelet components $ s \u0302 5$ are zero in this signal after the shrinkage from the sparse representation algorithm. The third and fourth levels of wavelet representation show higher resolution in the time domain data than the lower levels (see Fig. 15). This is because the wavelet components $ s \u0302 3,\u2009 s \u0302 4$ correspond to the amplitude of the higher frequency signal. After the representation, $W s \u0302 3$ and $W s \u0302 4$ contain the acoustic feature wave packs with shorter duration in the time domain, whereas the lower frequency representations, i.e., $W s \u0302 1$ and $W s \u0302 2$, provide wider wave packs. Therefore, in this work, the higher frequency representations ( $W s \u0302 3+W s \u0302 4$) were used to estimate the location of artefacts in the pipe. Specifically, the locations of artefacts were determined using the coordinates of the peaks of the envelope of the signal at high frequencies $W s \u0302 3+W s \u0302 4$. The envelope of the signal $W s \u0302 3+W s \u0302 4$ was calculated using the magnitude of its analytic signal, which was computed by filtering $W s \u0302 3+W s \u0302 4$ with a Hilbert FIR filter of five points length^{16} (implemented using the function @envelope in matlab). The envelope results are shown in Fig. 16.
The blockage was located at 4 m away from the robot with 0.5% prediction error using wavelet representation, whereas the localization error was 4.2% when the impulse response was used. More experiments were carried out using different sizes of blockages and lateral connections. The prediction error of higherlevel wavelet representation was below 0.7%. This demonstrates an advantage of using wavelets for the robotic localization with higher accuracy and precision.
Although the pipe artefacts can be localized with respect to the position of the robot, the directions of echo pulses are unknown. Sound intensity measurement can be a solution to determine whether the echo comes from front or back of the robot.^{17} This paper takes the robot position uncertainty into consideration to use the sequential measurement for the localization of artefacts and pipe mapping. The localization measurement using wavelet can be applied to this sequential robotic localization using the Kalman filter as discussed in Sec. II C, where the robot moves in the pipe and takes the acoustic measurement sequentially every several meters (i.e., every 2 m in this paper).
D. Sequential robot localization
Photographs of the robotic localization test rig are shown in Fig. 9, and Fig. 17(a) illustrates this rig schematically. In this experiment, a heavy wooden board was installed at the far end of the pipe to represent a full blockage. The robot was moved toward the lateral connection and stopped every 2 m to measure the impulse response. The measured impulse responses are shown in Figs. 17(b) and 17(c) with and without the sparse representation method, respectively. The sparse representation method removes a significant amount of background noise, including dispersive higher modes and some unwanted reflections from the pipe joints.
Using the acoustic echoes reflecting from different features in the environment, the robot can estimate its position while it moves along the pipe. Using the process described in Sec. II D, an estimate of the robot's position can be made by combining the prior estimate of the robot's position with traditional odometry and with new acoustic information obtained from the reported measurements. The uncertainty in new information is incorporated into the estimate, so that the process is designed to have robustness to noise in measurements. However, the measurement noise does have an effect on the precision of the estimate, which is investigated here.
Figure 18 shows the results from the simulation of the variation in the mean estimate error for the robot's trajectory (over 100 trajectories) along a pipe obtained for a range of measurement noise levels. This measurement noise was the standard deviation of the Gaussian noise added to each continuous value of distance measurement found from an acoustic echo. For comparison, the estimate made with traditional odometry without using acoustics is also shown. This result illustrates the impact of the uncertainty in robot motion along the pipe. As the acoustic measurement precision increases, the measurement uncertainty decreases, and the median estimate error is seen to decrease from close to the benchmark estimate error of 0.6 m at 1 m of measurement uncertainty to 0.25 m at 0.1 m of measurement uncertainty. This illustrates the strong impact of improved acoustic echo measurement precision on robot localization that can be achieved using the approaches described in this paper.
E. Classification of blockages and junctions
In this work, the two classes of pipe artefacts were identified: (1) a blockage and (2) a lateral connection. The blockages (35 different types in total) shown in Fig. 9 were used for the training and testing of the SVM model. Junctions (25 different types in total) included pipe joints, lateral connections, Tjunction, and corner junction. As discussed in Sec. II D, the time domain wave packs $ x \u0302$ were used directly as input (X_{i}) for the training and testing. Wavelet components ( $ s \u0302$) associated with the artefacts were also used as the input (X_{i}), where the classifier was expressed as a waveletSVM classifier. In this study, crossvalidation was used with five folds, i.e., groups that data samples are split into for the evaluation of SVM modeling to protect against overfitting via data partitioning.
For the waveletbased SVM classifier, the accuracy of the blockage detection in front of the robot at the first echo was 88% (53/60) based on the provided cases as shown with the confusion matrix in Table I. The time domain SVM classifier enabled us to achieve 78% (47/60) accuracy based on the provided cases. It is worth noting that a linear SVM was also implemented in this study with 65% (39/60) and 53% (32/60) accuracy using the wavelet components and time domain data, respectively. The detailed accuracy, precision, recall, and F1 score^{12} for these four classifiers are shown in Table II. These results provide the evidence that using wavelet components and nonlinear kernel (RBF) improves the classification accuracy relative to using linear kernel and raw time domain data.
.  Predicted blockage .  Predicted junction .  . 

Actual blockage  TP = 31  FP = 4  35 
Actual junction  FN = 3  TN = 22  25 
Total  34  26 
.  Predicted blockage .  Predicted junction .  . 

Actual blockage  TP = 31  FP = 4  35 
Actual junction  FN = 3  TN = 22  25 
Total  34  26 
Metric .  Time domain linear SVM .  Wavelet linear SVM .  Time domain RBF SVM .  Wavelet RBF SVM . 

Accuracy  53%  65%  78%  88% 
Precision  0.571  0.686  0.829  0.886 
Recall  0.625  0.615  0.806  0.912 
F1 score  0.597  0.649  0.817  0.899 
Metric .  Time domain linear SVM .  Wavelet linear SVM .  Time domain RBF SVM .  Wavelet RBF SVM . 

Accuracy  53%  65%  78%  88% 
Precision  0.571  0.686  0.829  0.886 
Recall  0.625  0.615  0.806  0.912 
F1 score  0.597  0.649  0.817  0.899 
Eight different settings of pipe network have been used as the demonstration examples, which are shown in Fig. 19. In Fig. 19, time domain SVM classifier can estimate the first artefacts close to the robot accurately, apart from a small blockage (blockage 1), which presents smaller reflection energy, whereas the SVM model using the wavelet components as the training and testing data shows a more accurate (10% accuracy improvement, particularly for small blockages) classification result than the time domain SVM classifier. Furthermore, the reflection from joints/lateral connections 7 m behind the robot can also be predicted by waveletSVM classifier with around 92% accuracy. However, the time domain SVM enabled us to achieve only 50% prediction accuracy. This provides evidence that the wavelet method, which takes advantage of the sparsity of the impulse response, can be used to improve the prediction accuracy in comparison to the time domain SVM classification method.
Although the prediction using waveletSVM classifier tends to be accurate in the testing examples (Fig. 19), there are some cases in the experiment resulting in error classifications:

Small blockages or sound absorbent blockage materials [e.g., acoustic absorption foam in this study; see Fig. 9(c)]. This is because these kinds of blockages do not reflect enough acoustic energy, which leads to a negligibly small amplitude of their impulse response and distortion in the assumed reflection coefficient spectra. The smallest successful blockage identified in this work using waveletSVM classifier was 20% blockage (blockage 1 in Fig. 19).

Blockages located close to a junction (<1 m). The acoustic echo from blockage overlaps with the reflection from the junction or another artefact, making it difficult to separate multiple reflections and to classify each of them.

A robot located too close to the blockage or junction (<1 m). This is similar to case 2, where the multiple reflections occur and overlap.

Blockages located behind any artefacts. For example, if the blockage is behind a lateral connection, then the reflected signal can be colored by the presence of this lateral connection. As shown in Fig. 19, the lateral connection after blockage 3 is mistakenly classified.
The classification method provided in this paper uses a limited number of blockage and lateral connection cases simulated in the laboratory. More experimental data and realistic environmental testing will be needed to extend this method for multiple classification of different types of blockages and junctions.
VI. CONCLUSIONS
This paper proposed a new acoustic method to simultaneously detect, localize, and identify the conditions in an airfilled pipe. Compared with previous studies, the main novel contributions of this paper are (1) the use of a microphone array to extend the usable acoustic frequency range to estimate the reflection coefficient from blockages and lateral connections, (2) a robust regularization method of sparse representation based on a wavelet basis function adapted to reduce the background noise in the acoustical data, and (3) the use of wavelet components to localize and classify the blockages.
In particular, multiple microphones have been used to reconstruct the plane wave mode beyond the first three eigenfrequencies to support more accurate condition detection, localization, and classification. Numerical and experimental results for the modal reflection coefficient from a blockage and lateral connection have been predicted and compared with measurements. This information has been used to support condition detection and classification.
Wavelet basis functions have been used to sparsely represent the plane wave mode impulse response for the condition detection and classification using the $ l 1$norm regularization method. The higherlevel wavelet functions referring to the higher frequency components of the impulse response have been used to localize the robot and blockage/lateral connection with a higher resolution and accuracy. It has been shown that the wavelet components can also be used to train and to test the SVM classifier for the blockage identification with higher accuracy than using the time domain SVM classifier.
ACKNOWLEDGMENT
This work is supported by the UK Engineering and Physical Sciences Research Council (EPSRC) Programme Grant No. EP/S016813/1. The authors would like to gratefully thank Gavin Sailor for kindly helping with the design of the robotic platform to support the acoustic sensing system. The authors would also like to gratefully thank Dr. Will Shepherd and Paul Osbourne for kindly helping with the design of the blockages and providing other experimental facilities.
APPENDIX
The detailed key steps of the SpaRSA algorithm are presented in Table III.
Task: To solve the problem $ s \u0302= arg min 1 2 W s \u2212 x 2 2 + \lambda s 1$ 
Input: Response signal $e,$ wavelet dictionary $W$, parameter $\lambda =0.001$ 
Initialization: k = 1, $A=W$, $ x 1=x$, $ \tau 1I= A TA$, tolerance $\epsilon = 10 \u2212 5$ 
Iteration: 1. $ \lambda k= max 0.2 A T x k \u221e , \lambda $. 
2. Exploit soft shrinkage: $ s k + 1=shrink s k \u2212 A T A s k \u2212 x / \tau k , \lambda k / \tau k$ 
(where $shrink s i , \lambda =sign s i max s i \u2212 \lambda , 0$) 
3. Update the step size: $ \tau k= s k + 1 \u2212 s k T \u2207 \u03d1 \u2009 s k + 1 \u2212 \u2207 \u03d1 \u2009 s k s k + 1 \u2212 s k T s k + 1 \u2212 s k$ 
4. If $ s k + 1 \u2212 s k s k\u2264\epsilon $, go to step 5. Otherwise, return to step 2 
5. $ s k + 1=x\u2212A s k + 1$ 
6. If $ \lambda k=\lambda $, stop; Otherwise k = k + 1, and return to step 1. 
Output: $ s \u0302= s k$, $ x \u0302=W s \u0302$ 
Task: To solve the problem $ s \u0302= arg min 1 2 W s \u2212 x 2 2 + \lambda s 1$ 
Input: Response signal $e,$ wavelet dictionary $W$, parameter $\lambda =0.001$ 
Initialization: k = 1, $A=W$, $ x 1=x$, $ \tau 1I= A TA$, tolerance $\epsilon = 10 \u2212 5$ 
Iteration: 1. $ \lambda k= max 0.2 A T x k \u221e , \lambda $. 
2. Exploit soft shrinkage: $ s k + 1=shrink s k \u2212 A T A s k \u2212 x / \tau k , \lambda k / \tau k$ 
(where $shrink s i , \lambda =sign s i max s i \u2212 \lambda , 0$) 
3. Update the step size: $ \tau k= s k + 1 \u2212 s k T \u2207 \u03d1 \u2009 s k + 1 \u2212 \u2207 \u03d1 \u2009 s k s k + 1 \u2212 s k T s k + 1 \u2212 s k$ 
4. If $ s k + 1 \u2212 s k s k\u2264\epsilon $, go to step 5. Otherwise, return to step 2 
5. $ s k + 1=x\u2212A s k + 1$ 
6. If $ \lambda k=\lambda $, stop; Otherwise k = k + 1, and return to step 1. 
Output: $ s \u0302= s k$, $ x \u0302=W s \u0302$ 