Robust detection of acoustically quiet, slow-moving, small unmanned aerial vehicles is challenging. A biologically inspired vision approach applied to the acoustic detection of unmanned aerial vehicles is proposed and demonstrated. The early vision system of insects significantly enhances signal-to-noise ratios in complex, cluttered, and low-light (noisy) scenes. Traditional time-frequency analysis allows acoustic signals to be visualized as images using spectrograms and correlograms. The signals of interest in these representations of acoustic signals, such as linearly related harmonics or broadband correlation peaks, essentially offer equivalence to meaningful image patterns immersed in noise. By applying a model of the photoreceptor stage of the hoverfly vision system, it is shown that the acoustic patterns can be enhanced and noise greatly suppressed. Compared with traditional narrowband and broadband techniques, the bio-inspired processing can extend the maximum detectable distance of the small and medium-sized unmanned aerial vehicles by between 30% and 50%, while simultaneously increasing the accuracy of flight parameter and trajectory estimations.
I. INTRODUCTION
Passive distributed acoustic sensor arrays have been used for detecting and tracking moving aircraft,1–8 ground vehicles,9–15 and unmanned aerial vehicles (UAV).16–25 Several array configurations have been explored, including the small aperture circular array,4 the L-shaped planar array,7 a tetrahedral array,6 and the widely distributed small arrays.10–12 By analysing the acoustic signals received by the individual microphones, a variety of applications can be realized, including object classification, target tracking, and simultaneous localization and mapping (SLAM).26 With the fast development of UAV platforms, these systems have become a very useful tool for a wide variety of applications,27 including structure inspection,28 surveillance,29 3D mapping,30 and acoustic tomography.31 Nevertheless, an unauthorized UAV may pose a threat to an airport, individuals, or military bases. Therefore, long range detection and precise location of the UAV becomes important for safety and security purposes.
The acoustic signature emitted by UAVs makes their passive detection and tracking possible. Depending on the spectral components of the acoustic signal, two traditional processing techniques for flight parameter estimation are readily available. For propeller-driven aircraft and helicopters that emit strong harmonic tones a narrowband processing technique20,32 may be used to estimate the flight parameters. The approach is based on identification of the instantaneous frequency of the motion-induced, Doppler shifted acoustic signature of the aircraft. A broadband processing technique,20 which measures the temporal variation of the time delays between multiple microphone pairs, may also be used. Compared with the narrowband technique, the broadband approach is more flexible because it can handle UAVs that do not emit strong narrowband tones or fixed harmonic frequencies. Accurate estimation of flight parameters is best achieved through application of both methods when a UAV overflies the array. Indeed, using such approaches several researchers have reported detection ranges for aircraft and UAVs in excess of 2 km.2,17,21,33 When a UAV is far away from the microphones, however, the signal is weak compared to noise and both broad and narrowband approaches struggle to achieve reliable results. This raises a challenge for UAV detection, localization, and tracking, as observation of the acoustic signal at long range is usually highly desirable.
Similar signal conditions exist in the natural world. For instance, the spread of luminance in naturally lit scenes typically covers a very large dynamic range, and the details in dark regions are immersed in noise.34 In this regard, insect visual systems, such as that of the hoverfly, have proven to be a powerful information capture system.35 Similar to the visual scenes, acoustic signals can be converted into “images” using spectrograms and correlograms. These traditional techniques thus permit us to visualize the (one-dimensional) acoustic signal as (two-dimensional) images based on a corresponding time-frequency analysis. With the narrowband method, detected acoustic signals are usually visualized as spectrograms and the frequency of harmonics extracted. For the broadband technique correlograms are used, from which the time delays can be obtained. In this sense, the acoustic signal of interest, in the form of harmonics or correlation peaks, can be presented as patterns in two-dimensional arrays or matrices and analysed using vision processing techniques.
The particular model used in this study is described in more detail below in Sec. III and the references therein. It derives from a fully elaborated biologically inspired vision (BIV) model that, in addition to enabling scene agnostic, sub-pixel motion detection via electro-optic and infrared sensor modalities, can apply signal conditioning to sensor output prior to downstream processing. The model is based on multiple layers of non-linear dynamic adaptive components measured or suspected from responses of neurons in pathways of the hoverfly brain. Uniquely, however, we have transitioned the techniques from biological finding and theoretical simulation to embedded hardware implementation running in real time.36 The approach enhances and suppresses elements of related and unrelated signal and noise, providing crisp sub-pixel/low-amplitude signal detection and classification for these difficult target sets.37
This vision-inspired method differs substantially from techniques that draw upon the biology of auditory systems.38,39 The technique also differs from the well-known biologically inspired convolution neural network (CNN) method.40,41 A CNN is an adaptive algorithm that is based on interconnected nodes (neurons) arranged in a layered structure to resemble the human brain. The CNN learns from pre-supplied data and can thus be trained to classify the data by breaking down the input layers into layers of abstraction. It is trained using many examples, by the way in which the individual neurons are connected and by the strength (weight) of those connections. The weights are automatically adjusted during training according to specified learning rules until the algorithm performs appropriately. The last layer of the CNN indicates whether the target class (in our case a UAV signature) was present within the data.
This means that while the hidden layers of a CNN may extract and enhance features the core task is typically classification, and any decision making is based on patterns in the training data. This contrasts with the proposed algorithm in this paper, which is focused on enhancement only: classification of the data is still required post facto. This means the two approaches are not mutually exclusive. In fact, there is reason to believe a CNN trained on outputs from the BIV model could be smaller and more accurate than one trained on raw data (due to enhancement of the signals relative to the noise).
This paper is organized into six sections including the present one. Section II briefly describes the traditional methods of UAV acoustic signal visualization, including the narrowband and broadband processing. Section III describes the principles of the bio-vision technique. In Sec. IV field trial used to gather the experimental data is explained. The results of the comparisons between the traditional and bio-inspired methods are given in Sec. V. Finally, in Sec. VI this work is summarized and possible future investigations suggested.
II. UAV ACOUSTIC SIGNAL VISUALIZATION
A. Notations and assumptions
(Color online) Flight model and the geometrical configuration of acoustic array. The UAV flies past the array in a straight line at constant speed and altitude. Red triangles: passive acoustic array; grey dashed line with arrow: trajectory projection on the x-y plane.
(Color online) Flight model and the geometrical configuration of acoustic array. The UAV flies past the array in a straight line at constant speed and altitude. Red triangles: passive acoustic array; grey dashed line with arrow: trajectory projection on the x-y plane.
The acoustic array here comprises a general form, that has a wide aperture with NRS small aperture arrays, each with NCH microphones (each microphone representing a different channel). The j-th channel of the i-th small acoustic array is denoted as located at position . The central position of the i-th small microphone array is denoted as , which can be obtained through . In this paper, one small microphone array is located at the origin point of the coordinate system (see Fig. 6 later).
B. Narrowband technique
The narrowband technique suits acoustic signals that contain strong narrowband tones. The acoustic signals are visualized as spectrograms using time-frequency analysis. Figure 2 shows the power spectral density (in dB) of a UAV observed by a single ground microphone. As well as the harmonic tones emitted by the UAV, the wind noise and acoustical Lloyd's mirror effects are also visible. The wind noise exists throughout the observation period, its spectral width varying in accordance with different gust intensities. Although the maximum frequency of wind noise contamination can reach 500 Hz and simply high-pass filtering the signal could eliminate wind noise, considerable harmonic information could also be discarded, which is not desirable.
(Color online) Power spectral density vs time (spectrogram) for a UAV transit observed on a ground microphone 2 m above ground.
(Color online) Power spectral density vs time (spectrogram) for a UAV transit observed on a ground microphone 2 m above ground.
The Lloyd's mirror effect, generated by reflection of acoustic waves by the ground, is clearly visible when the UAV is near the array. This introduces a slow-change variation in the spectrogram, making some harmonic components hard to detect. For instance, in Fig. 2 the amplitude of the harmonic signal around 600 Hz at time 10 s is almost as low as the noise floor, which results in a breakpoint in the harmonics of this order. It is noted that the Lloyd's mirror effect can be utilized in estimating some flight parameters.42,43 However, the Lloyd's mirror effect is distinct within only a short period when the airborne source flies over the acoustic sensor and thus not suitable for long-distance detection.
Figure 3 shows the cepstrogram for the same UAV considered in Fig. 2. Unlike the spectrogram, in the cepstrogram the power of wind noise is concentrated within a fixed quefrency range. The Lloyd's mirror is distributed within s and s, corresponding to the UAV being within a range of about 300 m from the array. The harmonic components are distributed near s, which is clearly distant from both the Lloyd mirror and the wind noise. Therefore, by applying a bandpass filter that retains the harmonic components on the cepstrogram we can effectively distill the essential harmonics. Figure 4(a), which contains only the harmonics, shows the result of cepstrum filtering. Figure 4(b) shows the background spectrogram and contains only the Lloyd mirror and wind noise. Using this approach, we can eliminate the influence of the unwanted signals.
(Color online) The cepstrogram for the same UAV considered in Fig. 2.
(Color online) Result of cepstrum filtering applied on the spectrogram in Fig. 2. (a) Harmonic spectrogram which contains acoustic harmonics only; (b) Background spectrogram, which contains the wind noise and Lloyd mirror.
(Color online) Result of cepstrum filtering applied on the spectrogram in Fig. 2. (a) Harmonic spectrogram which contains acoustic harmonics only; (b) Background spectrogram, which contains the wind noise and Lloyd mirror.
After cepstrum filtering, the harmonic spectrogram is then processed for pitch detection. A number of traditional methods are available for high-accuracy pitch detection. For example, in time domain the zero-crossing rate (ZCR),45 robust algorithm for pitch tracking (RAPT)46 and YIN estimators47 may be used, while in the frequency domain there are component frequency ratios,48 cepstrum analysis,44 optimum comb filters,49 and harmonic product spectrum (HPS).50 In this paper, since the acoustic signal has been visualized into spectrogram, the HPS method for pitch estimation is adopted.
It is worth noting that not all UAVs exhibit strong acoustic harmonics. Consequently, the narrowband processing is not suitable for all UAV signatures.
C. Broadband technique
III. BIO-VISION PROCESSING
A complete biologically inspired vision (BIV) model is a multi-stage non-linear system with adaptive feedback both within and between stages. This model has previously been used only on electromagnetic data but has shown great promise in estimating optic flow54 and detecting targets in clutter34 in both visual36 and infrared37 portions of the spectrum. Even using the first stage of the model in isolation has yielded improved clarity in poor lighting conditions55,56 and better target detection.57,58
The photoreceptor cell (PRC), which is responsible for dynamic range reduction of the input signal, provides the first stage of the biological visual system. Photoreceptors dynamically adjust the dark and bright regions of input images through temporal pixel-wise operations.59
Figure 5 shows the elaborated mathematical model of the bio-vision photoreceptor.34 It includes four stages: (1) the adaptive filtering; (2) the low-pass divisive feedback, which is also called as the DeVries-Rose stage; (3) the exponential low-pass divisive feedback, which is known as the Weber stage; and (4) the non-linear Naka-Rushton transform. This four-stage model is functionally equivalent to the processing conducted in a primate cone.60 The detailed implementation is described as follows.
(Color online) Mathematical model of the photoreceptor cells of a hoverfly's early vision system.
(Color online) Mathematical model of the photoreceptor cells of a hoverfly's early vision system.
A. Adaptive filter
This stage includes low-pass filtering with a dynamic cut-off frequency, the value of which depends on the adaptation state (long-term average value of the element). This is followed by a variable gain control that acts to reduce the differences over a wide range of acoustic intensities by using a larger gain when the signal is low compared to when it is high. In visual images the power of the signal component is approximately inversely proportional to its spatial frequency, whereas the noise component is essentially white, i.e., constant over all frequencies. There is therefore a frequency above which the signal-to-noise ratio (SNR) falls below an acceptable level and a low-pass filter should be employed. The threshold at which the SNR drops varies in accordance with the intensity (brightness) of the input signal because the majority of the noise is in the sensor itself and therefore independent of the scene being observed. Similarly, in acoustic applications the elements that relate to low amplitude acoustic signals must be more heavily filtered than those that receive high amplitude inputs since their SNR will be low and can be increased by the reduction of high frequency signal components.
The initialization of this stage is derived from the steady-state response, i.e., , where is the initial input to the bio-vision model.
B. Low-pass divisive feedback (DeVries-Rose)
C. Exponential low-pass divisive feedback (Weber)
The initialization of this stage comes from the steady state function that with the solution , where is the Lambert W function. When , the solution can be approximated as . This logarithmic relationship is referred to as the Weber law, which is the name of this stage. The exponential scaling enables this stage to perform significant non-linear rescaling of the signal, which can drastically reduce the intensity of the largest elements.
D. Non-linearity (Naka-Rushton)
IV. FIELD TRIAL EXPERIMENT
A. Field trial equipment
Field trials were conducted at a site known as Evett's Field at the Woomera Test Range, South Australia. The terrain is flat and open, has sandy/rocky ground, no grass and sparse vegetation. Aside from an equipment hut 300 m west of the microphone array there are no substantial scattering objects obstructing line of sight propagation. As shown in Fig. 6(a), Evett's Field has two runways (Runway #1 and Runway #2), each with a length of around 2 km. The acoustic array was located to the south eastern end of the two runways. An array of 49 microphones was deployed in a fractal pattern of 7 groups of 7 smaller arrays (Fig. 6). Each small array comprised a microphone at its centre (height 2 m) and two sets of three microphones (height 0.15 m) at radii of 1 m and 5 m, each separated in angle by 120°. The 7 smaller arrays were themselves arranged in a similar pattern of equilateral triangles, the inner triangle having 50 m sides, the outer 100 m sides. The position of each microphone was located using real time kinematic carrier phase differential global positioning system, which has a 1σ accuracy of ±0.03 m.
(Color online) (a) Field trial deployment. (b) The distribution of large acoustic array and (c) small acoustic array (north is up for all images).
(Color online) (a) Field trial deployment. (b) The distribution of large acoustic array and (c) small acoustic array (north is up for all images).
The sound fields at each array were measured using ECM800 10 mV/Pa condenser microphones sampled at 44.1 kHz using an 8-channel, 24-bit Data Acquisition (DAQ) recorder with 107 dB spurious free dynamic range. Accurate time stamping of the data were obtained from a GPS-derived one pulse per second (1PPS), sampled using channel one of the DAQ.
B. Flight scenarios
Different classes of UAVs were used during the trials, including a DJI Matrice 600 (15 kg, 1.7 m diameter, hexa-rotor), a Skywalker X-8 (3.5 kg, 2.1 m wingspan), and a DJI Mavic Air (0.5 kg, 0.2 m, quad-rotor). The flight scenarios reported in this paper are shown in Table I. The Matrice 600 was equipped with an acoustic payload which continuously generated a fixed frequency set of strong narrowband tones superimposed onto a broadband random component, with time-invariant narrowband energy extending from 50 Hz to at least 5 kHz in 50 Hz steps. This payload simulated a UAV propelled by a petrol-driven engine with constant tonal output regardless of flight dynamics, i.e., an idealised such signature. The Skywalker X-8 (petrol-driven) and Mavic Air (electrically powered) were flown without acoustic payloads. The flight scenarios are described throughout the paper as FS1 (Matrice 600), FS2 (Skywalker X-8), and FS3 (Mavic Air), all of which flew along Runway #2 in a northerly direction. All the UAVs listed in Table I were mounted with an iMet XQ UAV sensor, which recorded the UAV's GPS location and local meteorological factors (temperature, pressure, relative humidity) at a rate of 1 Hz. The GPS data from the iMet sensor were used as ground truth for the UAV flight trajectories.
Flight scenarios. (N: North, S: South.)
Label . | UAV . | Payload . | Height(m) . | Profile . |
---|---|---|---|---|
FS1 | Matrice 600 | Acoustic | 100 | Runway#2 S →N |
FS2 | Skywalker X-8 | N/A | 210 | Runway#2 S →N |
FS3 | Mavic Air | N/A | 100 | Runway#2 S →N |
Label . | UAV . | Payload . | Height(m) . | Profile . |
---|---|---|---|---|
FS1 | Matrice 600 | Acoustic | 100 | Runway#2 S →N |
FS2 | Skywalker X-8 | N/A | 210 | Runway#2 S →N |
FS3 | Mavic Air | N/A | 100 | Runway#2 S →N |
V. RESULTS AND DISCUSSION
Once the data acquisition was completed, the signals were processed using both the narrowband and broadband techniques. Note that as the narrowband processing is only suitable for UAVs with strong harmonic signals, the technique was only applied to FS1. For the other two flight scenarios, there are not obvious fixed harmonic tones. This was because, despite FS2 having a petrol engine, the changing demands on the engine during flight resulted in an acoustic signature with high variability in the dominant frequencies. Consequently, FS2 and FS3 were not suitable for narrowband processing. The broadband technique, however, was applied to all three flight scenarios. The processing details and results, including the improvements obtained by applying the proposed bio-vision technique, are described below.
A. Narrowband technique
The data from each microphone was pre-filtered through a low-pass finite impulse response (FIR) anti-aliasing filter (AAF) with a passband cut-off of 5 kHz, passband ripple of 0.1 dB and stop band attenuation of 100 dB, prior to downsampling by a factor of 5. The spectrograms were obtained through time-frequency analysis, with a digital Fourier transform block size of 4096 samples, 75% overlap between two consecutive data blocks, Hann windowing, and 2 times zero-padding. When combined with the bio-processing model the BIV parameters were set as Hz, Hz, Hz, Hz, Hz, . These parameters were selected using an empirical process and were not optimised against any quantifiable criteria. Figure 7(a) shows the normalized (with respect to maximum power spectral density) spectrogram of FS1 (Matrice 600 with acoustic payload) obtained from the first channel of the microphone array RS1, while Fig. 7(b) shows the same spectrogram after bio-processing. With the help of adaptive filtering and nonlinear transforms, the bio-processing has enhanced the related acoustic harmonics and suppressed the unrelated noise. Two particular regions, which correspond to ranges of <300 m and around 1000 m from the array and are marked as Z1 and Z2 in Fig. 7, are expanded to more clearly show the improvement of the bio-processing approach. Z1 represents the high-frequency region when the UAV was near the acoustic array. In this case, the narrowband harmonics were low in power, but quickly varied due to the high harmonic order. Z2 denotes the low-frequency region when the UAV is far away from the array. Figures 7(c) and 7(e) show enlargements of the Z1 and Z2 regions of the normalized spectrogram, respectively. Some harmonics are barely visible as their amplitudes are insufficient to be clearly identified from the background. Figures 7(d) and 7(f) are the same two regions after BIV processing. The former illustrates a more distinct, clearer set of harmonics up to t = 20 s and the latter demonstrates an obvious harmonic signal around t = 90 s.
(Color online) Narrowband spectrograms with (a) and without (b) biologically inspired vision (BIV) processing. Images on the right show enlarged regions for Z1 and Z2. BIV processing led to improved contrast between the signal harmonics and the background.
(Color online) Narrowband spectrograms with (a) and without (b) biologically inspired vision (BIV) processing. Images on the right show enlarged regions for Z1 and Z2. BIV processing led to improved contrast between the signal harmonics and the background.
(Color online) Fundamental frequency estimation for the acoustic signal (a) without BIV processing and (b) with BIV processing for the centre microphone array (RS1) of flight scenario 1. Red solid line: NLS fit. BIV processing resulted in a higher PSNR overall, and the ability to track the signal for longer.
(Color online) Fundamental frequency estimation for the acoustic signal (a) without BIV processing and (b) with BIV processing for the centre microphone array (RS1) of flight scenario 1. Red solid line: NLS fit. BIV processing resulted in a higher PSNR overall, and the ability to track the signal for longer.
The flight parameters estimated from the NLS regression are depicted in Table II. The error values are 1σ for the traditional and bio-vision methods and derived from the iMet GPS sensor performance envelope for iMet (ground truth) data. The estimates of flight parameters appear slightly biased, mainly because of the wind noise. Compared with the iMet sensor data, however, the estimation of both traditional and bio-vision processing are acceptable.
Flight parameter estimation with narrowband technique.
Method . | (m/s) . | (s) . | (m) . | (m) . | (deg.) . |
---|---|---|---|---|---|
iMet data | 15.0 | –4.2 | 136.3 | 101.2 | 182.0 |
Traditional | 15.0 | –4.6 | 142.9 | 97.9 | 182.2 |
Bio-vision | 15.0 | –4.2 | 144.1 | 99.5 | 182.1 |
Method . | (m/s) . | (s) . | (m) . | (m) . | (deg.) . |
---|---|---|---|---|---|
iMet data | 15.0 | –4.2 | 136.3 | 101.2 | 182.0 |
Traditional | 15.0 | –4.6 | 142.9 | 97.9 | 182.2 |
Bio-vision | 15.0 | –4.2 | 144.1 | 99.5 | 182.1 |
The tracks of the UAV trajectory estimated by the acoustic array are superimposed onto the satellite photograph (Fig. 9). The red triangles are the locations of the small microphone arrays (RS1–RS7). In Fig. 9(a), the gray circles are the trajectory measured by the iMet XQ sensor GPS records, while the blue circles represent the UAV measured acoustically as it travels about 1134.5 m along the Runway #2, corresponding to a maximum slant range of 1147.5 m. As a comparison, the green circles in Fig. 9(b) show the measured UAV trajectory up to 1509.1 m, corresponding to a slant range of 1519.2 m. This indicates that for flight scenario 1, the maximum detection range was improved by approximately 33%.
(Color online) Estimated flight trajectory by narrowband technique of FS1 (a) without BIV processing and (b) with BIV processing. While the accuracy of the flight profile was almost equally accurate in both methods, during the period that the drone was tracked, BIV processing resulted in an increase in the tracking duration and hence range.
(Color online) Estimated flight trajectory by narrowband technique of FS1 (a) without BIV processing and (b) with BIV processing. While the accuracy of the flight profile was almost equally accurate in both methods, during the period that the drone was tracked, BIV processing resulted in an increase in the tracking duration and hence range.
B. Broadband technique
For the broadband technique, the acoustic data were visualized as correlograms. For each small microphone array, 7 channels (Ch1–Ch7) formed Np = 21 sensor pairs. The correlograms were implemented through the GCC-PHAT algorithm in the frequency domain, with a FFT window size of 8192 points, 2 times zero-padding, 50% overlapping, and Kaiser windowing. Note that there was no downsampling in this stage. The time step of the correlograms was 0.093 s, resulting in a frame rate fr of 10.77 Hz. When processing with the bio-vision, the input images were the correlograms, and the BIV parameters were set as Hz, Hz, Hz, Hz, Hz, . Once again, these parameters were not optimised against any objective criteria. Figures 10(a) and 10(c) show the correlograms of the first microphone pair (Ch1-Ch2) and the last microphone pair (Ch6-Ch7), respectively. The former represents the minimum distance between two microphones, the latter the maximum. The two corresponding correlograms processed using bio-vision are shown in Fig. 10(b) and 10(d). It is worth noting that the correlation peak has a higher amplitude and is temporally prolonged due to the amplification and filtering present in the first stage of the bio-inspired processing chain. There is also a “shadow” after the fast varying correlation peaks [as in Fig. 10(d)]. This is mainly due to the low-pass filter in the divisive feedbacks stages of the bio-vision model. However, the influence of this effect is negligible, since the delays were estimated through the peak searching of the correlograms and if anything increased the local contrast of said peaks.
(Color online) Correlograms for RS1 and FS1 (a) without and (b) with BIV processing.
(Color online) Correlograms for RS1 and FS1 (a) without and (b) with BIV processing.
Figures 11(a) and 11(b) demonstrate the estimated time delays of the 21 sensor pairs without and with bio-vision processing, respectively. The PNSRs were also calculated via Eq. (25). With the traditional broadband method, the maximal PNSR is about 30 dB, while the bio-vision achieved a PSNR improvement of around 10 dB. Figure 11(a) illustrates that without bio-processing, the correlogram peaks oscillate violently after 54.7 s, while with bio-processing, they remain stable up to 85.2 s. This result indicates an improvement in the tracking of 56%, which is even higher than the improvement for the narrowband processing.
(Color online) Estimated time delays of 21 sensor pairs (a) without BIV processing, and (b) with BIV processing colour-coded according to peak signal-to-noise ratio (PSNR). Not only did the bio-vision processing improve the PSNR of the correlograms, it also made the tracking of the peak more coherent, even at low PSNRs.
(Color online) Estimated time delays of 21 sensor pairs (a) without BIV processing, and (b) with BIV processing colour-coded according to peak signal-to-noise ratio (PSNR). Not only did the bio-vision processing improve the PSNR of the correlograms, it also made the tracking of the peak more coherent, even at low PSNRs.
The GCF from the correlograms were also computed according to Eq. (13). Figure 12 shows the GCF of FS1 with and without bio-vision at time 6 s and 80 s, respectively. The red crosses show the estimated bearing and elevation angles, and the white lines the trajectory traces obtained from the iMet data. This demonstrates that the BIV processing resulted in higher contrast and more accurate peaks in the GCF making the estimation of bearing and elevation more accurate at longer ranges. When the UAV was near the acoustic array, i.e., T = 6 s, the crosses for the traditional (a) and bio-processed (b) lie on the iMet trace, indicating both provide an accurate estimate of θ and . Note that the highlighted area of Fig. 12(b) is larger than that in (a), which was mainly due to the prolonged low-pass filtering effect in the BIV (as in Fig. 10). When the UAV was far away from the acoustic array, i.e., T = 80 s, traditional GCF lost track and provided an incorrect estimate, as in Fig. 12(c). However, with bio-vision processing the UAV was still visible in Fig. 12(c), and the estimate matches well with the iMet trace.
(Color online) Global coherence field (GCF) of Matrice 600 with acoustic payload (FS1) (a) without BIV at t = 6 s, (b) with BIV at t = 6 s, (c) without BIV at t = 80 s, and (d) with BIV at t = 80 s. The red cross in each image represents the global maximum and the white line the historical trajectory of this maximum over time.
(Color online) Global coherence field (GCF) of Matrice 600 with acoustic payload (FS1) (a) without BIV at t = 6 s, (b) with BIV at t = 6 s, (c) without BIV at t = 80 s, and (d) with BIV at t = 80 s. The red cross in each image represents the global maximum and the white line the historical trajectory of this maximum over time.
The flight parameters obtained from the NLS regression are shown in Table III. For all flight scenarios, an accurate estimate of flight parameters could be obtained using both traditional and bio-vision methods, although in a few cases the accuracy of the bio-vision method is worse than that of the traditional approach. However, these differences in accuracy were minor and are traded against large detection range extensions. The reason for the discrepancies between the BIV and traditional results is likely due to use of low-pass filters by the former, which induces a small phase delay within the data.
Flight parameter estimation with broadband technique.
FS# . | Method . | (m/s) . | (s) . | (m) . | (m) . | (deg.) . |
---|---|---|---|---|---|---|
3*FS1 | iMet data | 15.0 | −4.1 | 136.3 | 101.2 | 182.0 |
Traditional | 15.1 | −4.3 | 131.8 | 97.8 | 181.8 | |
Bio-vision | 15.2 | −4.4 | 134.9 | 99.9 | 181.5 | |
3*FS2 | iMet data | 20.4 | −7.5 | 147.8 | 210.1 | 179.9 |
Traditional | 20.9 | −7.1 | 148.5 | 211.9 | 179.9 | |
Bio-vision | 21.0 | −7.3 | 155.0 | 214.8 | 180.3 | |
3*FS3 | iMet data | 15.3 | −5.7 | 130.2 | 98.3 | 181.2 |
Traditional | 15.0 | −5.3 | 122.1 | 100.3 | 181.3 | |
Bio-vision | 15.0 | −5.8 | 123.6 | 102.0 | 181.5 |
FS# . | Method . | (m/s) . | (s) . | (m) . | (m) . | (deg.) . |
---|---|---|---|---|---|---|
3*FS1 | iMet data | 15.0 | −4.1 | 136.3 | 101.2 | 182.0 |
Traditional | 15.1 | −4.3 | 131.8 | 97.8 | 181.8 | |
Bio-vision | 15.2 | −4.4 | 134.9 | 99.9 | 181.5 | |
3*FS2 | iMet data | 20.4 | −7.5 | 147.8 | 210.1 | 179.9 |
Traditional | 20.9 | −7.1 | 148.5 | 211.9 | 179.9 | |
Bio-vision | 21.0 | −7.3 | 155.0 | 214.8 | 180.3 | |
3*FS3 | iMet data | 15.3 | −5.7 | 130.2 | 98.3 | 181.2 |
Traditional | 15.0 | −5.3 | 122.1 | 100.3 | 181.3 | |
Bio-vision | 15.0 | −5.8 | 123.6 | 102.0 | 181.5 |
As in the narrowband technique, we plot the track of UAV trajectory by broadband technique with the satellite photograph as in Fig. 13. With the traditional method, the maximum detectable slant range of the Matrice 600 (FS1), Skywalker X-8 (FS2), and Mavic Air (FS3) were 915.7 m, 901.1 m and 258.7 m, respectively. After bio-processing, the maximum detectable range of these FSs were 1360.5 m, 1204.7 m, and 336.7 m, indicating distance improvements of 48.6%, 33.7%, and 30.2%, respectively (see Table IV). It is worth noting that the Mavic Air has a much shorter detection range because of its smaller size, lower signal intensity, and higher spectral signature compared with the other medium size UAVs.
(Color online) Estimated flight trajectory derived using broadband technique for FS1, FS2, and FS3, respectively.
(Color online) Estimated flight trajectory derived using broadband technique for FS1, FS2, and FS3, respectively.
The acoustic detection range for different flight scenarios.
FS# . | UAV type . | w/o BIV . | w/ BIV . | Improvement . |
---|---|---|---|---|
FS1 | Matrice 600 | 915.7 m | 1360.5 m | 49% |
FS2 | Skywalker X-8 | 901.1 m | 1204.7 m | 34% |
FS3 | Mavic Air | 258.7 m | 336.7 m | 30% |
FS# . | UAV type . | w/o BIV . | w/ BIV . | Improvement . |
---|---|---|---|---|
FS1 | Matrice 600 | 915.7 m | 1360.5 m | 49% |
FS2 | Skywalker X-8 | 901.1 m | 1204.7 m | 34% |
FS3 | Mavic Air | 258.7 m | 336.7 m | 30% |
Although all figures shown in this paper relate to flights beginning with the UAV starting close to the microphones (i.e., the SNR declines from that point on, and at towards the end of the run any angular change is small), overall the analysis is drawn from both the outward and the return legs of the flights, i.e., it includes trajectories where the SNR starts low and increases as the target approaches the observer. This was to eliminate any potential influence of “track extrapolation.”
In contrast to many traditional imaging systems that operate with a single global or regional gain and attempt to capture the world as faithfully as possible, the BIV operates at multiple local time scales, uses pixel-wise integration and manipulation, and employs self-adapting non-linear feedback between its stages. This enables it to process all parts of the data in parallel, whilst simultaneously allowing scene-independent adaptations between its components as there is no concept of spatial (and thus spectral) structure in the initial processing stages. Data elements considered dynamic are accentuated, whilst static ones are condensed. This allows the huge dynamic range of the real world to be compressed into manageable bandwidth for optimal information transmission and downstream processing across a diverse range of environments. Consequently, although extraneous coherent noise sources (e.g., interferers such as petrol generators) will be accentuated relative to any background noise, consistent noise sources (such as a constant generator) will be suppressed relative to variable sources (such as a moving UAV), in the same way moving objects are enhanced relative to stationary ones within the visual system. Consequently, unless the temporal-spectral properties of extraneous signals completely overlay those of the UAV targets, in which case they are indistinguishable, the BIV processing will make it easier to discriminate interferers from the signals of interest. A detailed examination of the topics of the influence of interference and track extrapolation is beyond the scope of this paper and will be published elsewhere.
VI. CONCLUSION
This paper presents the use of a bio-inspired signal processing technique for detecting the acoustic signature of UAVs. Two standard time-frequency processing methods, based on narrowband and broadband techniques, were considered. Such approaches are commonly used by other researchers in this field, and the ranges reported (prior to the addition of any BIV signal conditioning) are similar to other publicly reported findings for such experiments.16–18,23–25 The photoreceptor model of the insect vision system was applied in conjunction with both these traditional methods. Field trials using three different types of UAV (fixed and rotary wing), and various flight scenarios show that for narrowband processing the bio-vision technique improved the maximum detection range by a factor of 33%, while for broadband processing the bio-inspired method achieved range extension of between 30% and 49%, depending on the UAV model/type and flight scenario.
Recently BIV processing has been shown to greatly increase the detection range of UAVs in both visual62 and infrared37 data. However, this is the first time such a finding has been translated to acoustic detection.
Compared with the traditional methods, the bio-vision method also achieves comparable accuracy in flight parameter estimation, indicating that the proposed method is accurate and reliable. Since BIV is a pre-processing (signal conditioning) technique it augments, not replaces, existing detection and tracking methods. This means that BIV can be integrated with other more complex UAV detection algorithms. Furthermore, since it is causal and made up only of relatively simple mathematical operations, BIV is also suitable for real-time applications. Optimisation of BIV parameters against a defined goal would likely lead to a further increase in performance, as has been observed in a different context.63 However, such improvements are beyind the scope of this paper. Future work includes verification using more UAVs and flight scenarios, fusion of the narrowband and broadband techniques, including further components in the BIV processing pathway,64 and application of the BIV to the real and imaginary components of the analytic signals which would allow more accurate determination of the mainlobe.
ACKNOWLEDGMENTS
This research was sponsored by the Australian Defence Science and Technology (DST) Group. We are very grateful to Michael Driscoll, Adrian Coulter, Martin Sniedze (DST), Steven Andriolo (EyeSky), and to Joshua Meade and Jarrod Skinner (UniSA) for trials support.