Coprime microphone arrays use sparse sensing to achieve greater degrees of freedom, while the coprimality of the microphone subarrays help resolve grating lobe ambiguities. The result is a narrow beam at frequencies higher than the spatial Nyquist limit allows, with residual side lobes arising from aliasing. These side lobes can be mitigated when observing broadband sources, as shown by Bush and Xiang [J. Acoust. Soc. Am. 138, 447–456 (2015)]. Peak positions may indicate directions of arrival in this case; however, one must first ask how many sources are present. In answering this question, this work employs a model describing scenes with potentially multiple concurrent sound sources. Bayesian inference is used to first select which model the data prefer from competing models before estimating model parameters, including the particular source locations. The model is a linear combination of Laplace distribution functions (one per sound source). The likelihood function is explored by a Markov Chain Monte Carlo method called nested sampling in order to evaluate Bayesian evidence for each model. These values increase monotonically with model complexity; however, diminished returns are penalized via an implementation of Occam's razor.
I. INTRODUCTION
The purpose of this paper is to apply model-based Bayesian inference to a broadband coprime array in order to not only accurately estimate the direction of arrival (DOA) of sound sources, but also predict the number of sound sources to locate. A sixteen channel coprime microphone array observes broadband sound sources in a low-noise free field environment and the resulting data are analyzed using a Bayesian method.
Model-based Bayesian inference is a methodology which is capable of incorporating information regarding the system being studied, including a sufficiently accurate model originating from the observed phenomena or underlying physics. These methods have been used increasingly in recent years in applications ranging from acoustic materials1 and room acoustic energy decay analysis2,3 to heat transfer4 and underwater geophysics.5 Applications of Bayesian framework to problems in architectural acoustics are still developing. In coupled volume research, Bayesian methods (using slice sampling and nested sampling) help in identifying the distinct decay times present and their individual contributions to observed reverberation.6,7
Array signal processing research including radar as well as sonar and other acoustic arrays has enjoyed implementations of a Bayesian framework for some time and more and more in recent years. Antoni8 demonstrated that an optimal interpolation basis for acoustical source reconstruction exists and that such a basis is spanned by the eigen-functions of a specific continuous-discrete propagation operator flanked by metrics which account for prior spatial information and noise. The dimension of this basis was considered to be minimal and information is necessarily lost if fewer basis functions are used. Nearfield acoustic reconstruction problems were tackled with a Bayesian approach by Bai et al.,9 and although the process was computationally demanding, the particle filtering technique, incorporating prior knowledge, proved more suitable than Tikhonov regularization and minimum-variance unbiased estimators in cases where source locations are unknown, as it could not only estimate source velocities but also source locations at the same time.
In 2014, an algorithm was formulated from a Bayesian perspective for off-grid DOA estimation, intended to improve upon a proposed sparse total least squares method. This so-called off-grid block-sparse Bayesian learning algorithm was more accurate than others in coarse grid, without requiring the number of signals a priori.10 Even a simple 2-microphone array may benefit from Bayesian inference, not only in its application to sound source DOA estimation, but also first to determine the number of sources present.11
These examples represent a brief overview of acoustics research work using Bayesian analysis. For a more recent comprehensive overview on Bayesian signal processing methods, see Ref. 12, and for more information about acoustic arrays in general, see Ref. 13. No coprime sensing methods have yet been included in these references.
Coprime sampling, as proposed by Vaidyanathan and Pal,14 suggests a specific array element configuration in the form of two superposed uniform linear arrays (ULAs), each with sparse spacings which are coprime multiples of the conventional non-sparse limit. They also showed that existing DOA estimation algorithms such as MUSIC can be implemented in this novel configuration.15 However, this kind of subspace approach does rely on previous knowledge of the number of concurrent sound sources. To further reduce computational complexity, Weng and Djurić16 took advantage of so-called “search-free” algorithms by transforming arbitrary arrays (coprime in this case) to equivalent ULAs for which these algorithms are well-understood and effective. Efforts have been made to quantify the effect of extending coprime arrays with additional elements and weightings.17 Sparsity enforced approaches to reconstruction and DOA estimation for off-grid targets have been investigated in coprime arrays.18,19 Interest has also been taken in applying coprime array theory to underwater acoustics20 as well as radar.21–24 In 2015, the authors presented experimental acoustic data confirming the frequency-dependent beampatterns of coprime arrays, and showed their effects over various bands of frequencies.25 This demonstrated possible increased effectiveness of DOA estimation with the broadening of bandwidth observed (see Fig. 1). Shortly thereafter, Shen et al.26 mathematically derived wideband DOA estimation methods based on group sparsity for coprime arrays and showed simulation results comparing these methods to single-frequency methods.
Far-field fixed spatial beamforming magnitude results using a coprime array with subarrays of nine and eight elements, each with 1 meter aperture. Impulse response measurements are taken in a free field environment at 1° increments. (a) Narrowband (semitone filter) coprime beampattern (solid line) and corresponding single-frequency (12.348 kHz) simulated beampattern via Eq. (3) (dashed line). (b) Third-octave band filtered coprime beampattern. (c) Octave band filtered coprime beampattern. (d) Broadband coprime beampattern. The center frequency for each of the filtered plots is the coprime design frequency, kHz.
Far-field fixed spatial beamforming magnitude results using a coprime array with subarrays of nine and eight elements, each with 1 meter aperture. Impulse response measurements are taken in a free field environment at 1° increments. (a) Narrowband (semitone filter) coprime beampattern (solid line) and corresponding single-frequency (12.348 kHz) simulated beampattern via Eq. (3) (dashed line). (b) Third-octave band filtered coprime beampattern. (c) Octave band filtered coprime beampattern. (d) Broadband coprime beampattern. The center frequency for each of the filtered plots is the coprime design frequency, kHz.
This work proposes a parametric model for broadband responses of coprime sensor arrays. To the best knowledge of the authors, the utilization of an analytical model in coprime sensor processing has not yet been reflected in published literature pertaining to DOA estimations by coprime array beamforming. Once the model's fidelity is validated by experimental data, the predictive model as prior information can be incorporated into two levels of Bayesian inference. Bayesian framework27 encompasses model selection and parameter estimation. Bayesian model selection is first applied to estimate number of sound sources at hand, without any prior assumption. This model-based target enumeration and subsequent parameter estimation represents a different approach than existing ones which assume the number of targets is already known prior to the DOA estimation.
Model selection searches for a concise model that fits the data well. This is articulated in previous classification and model selection literature by, among others, Wallace and Boulton28 (see also Rissanen29), which represents the principle of minimum description length. Early efforts along this line of techniques also include information criteria such as the Akaike information criterion (AIC),30 and a stronger criterion to penalize overparameterization proposed by Schwarz,31 the Bayesian information criterion (BIC, see also Wax and Kailath32). Even a more recent effort using a sample-eigenvalue based approach33 simplifies the penalty of overparameterization. This kind of effort is similar in nature to AIC and BIC in that they quantitatively implement Occam's razor in simplified ways, yet do not explore the likelihood mass over the parameter space to estimate the key Bayesian quantity, termed Bayesian evidence, and rather make strong simplifications for the sake of computational efficiency. As discussed in a recent review paper by Hansen and Yu,34 the minimum descriptive length, AIC and BIC have found applications in the past as approximation approaches, relying heavily on estimation of maximum likelihood values, yet do not explore Bayesian evidence, also termed marginal likelihood (see Sec. II).
Two levels of inference, model selection and parameter estimation, can be logically formulated within a unified Bayesian framework. Within this Bayesian framework, the single most important quantity being crucial to both levels is the Bayesian evidence or marginal likelihood, as asserted by McKay,35 “every effort should be devoted to calculating it.” More recently, Bayesian model selection efforts have focused on estimating the marginal likelihood using advanced Markov Chain Monte Carlo (MCMC) methods,35 such as nested sampling,36,37 which explore the likelihood mass more thoroughly to estimate Bayesian evidence. This work will briefly reintroduce the advanced sampling technique for more accurate estimating of the marginal likelihood or Bayesian evidence. Note that a recent work by Han and Nehorai38 has also applied two levels of inference (model selection and DOA estimation) using nested linear sensor arrays, but their work documented there has not yet been articulated within a unified Bayesian framework, and has not yet been validated using experimental data. In contrast, this work formulates the model selection and DOA parameter estimation within a unified Bayesian inferential framework. The model fidelity, target model selection, as well as DOA estimation has been demonstrated using experimental data. This work is contingent upon the observation through both numerical simulation and experimental validations of broadband coprime sensing of acoustics signals as documented in the authors' most recent publication.25 No model-based analysis has been attempted in the authors' previous work yet. To the best knowledge of the authors, this formulated model-based Bayesian inferential approach has not yet been sufficiently applied in acoustics applications, particularly in context of coprime sensing.
The remaining sections of this paper are structured as follows. Section I A includes a brief background of single-frequency and broadband coprime sensing, then Sec. I B introduces a phenomenological model for broadband coprime beamforming for DOA estimation. Section II reintroduces Bayes's theorem and derives the basic probability density functions (Pdf) of the Bayesian framework. Section II A describes the “nested sampling” method used for estimating the necessary quantities involved. Section II B derives the final marginalized likelihood function and lists the concrete steps taken to implement the nested sampling process. Experimental methods are explained in Sec. III and results are presented in Sec. IV. Section V includes a discussion, and this work concludes with Sec. VI.
A. Coprime array sensing
As put forward in Ref. 14, a coprime array consists of two nested, spatially undersampled ULAs (subarrays) operating at the same frequency, whose inter-element separations differ by a factor of , where M and N are coprime integers. If the two subarrays have equal aperture length, L, then M and N correspond to the respective number of elements each subarray. In this case, the coprime array design frequency which produces the narrowest beampattern without further undesired aliasing is25
Vaidyanathan and Pal14 prove that there is always only one look direction for far-field sources in which each subarray shares a full lobe with the other. This direction is determined by using the relative phase shifts of each subarray in conjunction with the Chinese remainder theorem for coprime numbers. With no phasing of the array elements, the main beam is normal to the axis of the array; of course this can be shifted using beamforming methods such as delay-and-sum.
By cross-correlating the linear-spatial domains of the subarrays (complex conjugate-multiplying their beampatterns), one is left with a single narrow main beam at the intersection of the subarrays' patterns and lower-level, deterministically-oriented, overlapping sidebands. The beam width at broadside (null-to-null) is
where c is the sound speed, L is the aperture length, and f is the frequency.25 At the design frequency from Eq. (1), this is .
If the frequency is lowered below that of Eq. (1), the uniqueness of the main beam persists due to the generalized proof in the authors' prior paper.25 Positioning of the unphased, unweighted beampattern's side lobes spreads with decreasing frequency in a deterministic way. Namely, the beampattern with unity gain is
at frequency f, where , and θ is angle from normal.
B. Broadband coprime sensing model
The gain at is the same irrespective of the frequency, whereas the rest of the beampattern including side lobes varies with frequency. Different frequencies' beampatterns superpose, resulting in what is essentially an average normalized beampattern over frequencies of Eq. (3), weighted by the magnitude spectrum observed. For this reason, it is possible to widen the bandwidth of the array to reduce sideband noise, much the same way as proposed by Rigelsford and Tennant39 for sparse random arrays. As long as the sound source observed by the array is sufficiently broadband, differing components of the directional response at adjacent frequencies can have an averaging-out effect with respect to the consistent main beam.
The proposed phenomenological model to represent the coprime beampattern data is a generalized Laplace distribution function
where A0 is a constant parameter for the noise floor and the three parameters per sound source are: amplitude, Ai, angle of arrival, , and width, δi, of each. In the following, the set of parameters is summarized by the vector . The total number of parameters (length of ) depends on the number of sources in the sound field via the relation , where s is the number of sound sources.
Figure 2 shows the modeled broadband coprime array response using Eq. (4) to two simultaneous incoherent broadband noise sources. After convolving two broadband coprime array impulse responses with source directions with white noise, their sum's directional response based on the experimental data is plotted against the model (experimental methods are explained in Sec. III). The errors between the model and the data are shown on the same plot. No prior knowledge on the statistics of these errors is known or assumed to be known, other than that these errors have a mean of zero and a finite variance. This specific “background information” (notated as I in what follows) is important for incorporating the model-based Bayesian framework.
(Color online) (a) Broadband directional pattern in response to two noise sources is shown in cartesian coordinates (solid line). The two-sound-source Laplace distribution function model with reasonable values of the model parameters is shown superimposed onto the experimental data (dashed line). (b) Errors between model and data are finite with mean zero.
(Color online) (a) Broadband directional pattern in response to two noise sources is shown in cartesian coordinates (solid line). The two-sound-source Laplace distribution function model with reasonable values of the model parameters is shown superimposed onto the experimental data (dashed line). (b) Errors between model and data are finite with mean zero.
II. BAYESIAN INFERENCE
Parameters of the aforementioned models are summarized as a vector . The Bayesian framework formulated in this specific problem begins with Bayes's theorem
where represents the prior probability of the parameters given the background information, which, when taken into account along with the observed data, , produces a posterior probability, , which is normalized by the marginal likelihood or evidence, .
Quantity is termed “likelihood,” which is the probability of the measured data given parameters and the background information I. Bayes's theorem in Eq. (5) represents how the prior knowledge about the parameters is modified in the light of the data, D.
In the model-based approach, the background information must include a model, M, or a set of competing models, Ms with , which describe the phenomenon reasonably well, i.e., the errors between the model and data are finite. For example, given the model in Eq. (4), these errors are illustrated in Fig. 2. For brevity, the I is dropped from Eq. (5); however, since there may be a set of competing models it is important to specify the model in each term, making the model-based Bayesian theorem
for a specific model, Ms.
To formulate the problem, Pdfs must be assigned. There are multiple ways of accomplishing this; this work relies on the principle of maximum entropy.40 This method requires that no assumptions are made beyond the known information that is available. In the case of the likelihood function, we can say that the variance between the observed data and model-predicted is finite (since we know the physical system well and can model it sufficiently), which leads to an assignment of the normal distribution41,42
where , and K is the number of data points. Note that assigning a normal distribution based on the principle of maximum entropy is different than assuming a normal error function between the model and the experimental data. It is a result of having no information other than knowing the mean error is zero and has finite variance.
In practice, the standard deviation term, σ, in the Gaussian likelihood function from Eq. (7) is not of any interest and is therefore a nuisance parameter. The dependence on this parameter can be removed by integrating the joint probability distribution of the likelihood, namely,
over all possible values of σ. can be thought of as the prior distribution of σ, whose Pdf is due to maximum entropy, since it is a scale parameter (this is known as Jeffreys's prior).36 Thus, the marginalized Pdf for the likelihood takes the form of a student t-distribution6
where is the gamma function evaluated at (again, K is the number of data points).
When nothing is known that may inform the assignment of the prior, unsurprisingly maximum entropy leads to a bounded uniform prior probability42
The remaining term needed to complete Eq. (6) is marginal likelihood, . This Pdf is not assigned; rather, since the data have been collected, this term acts as a normalization constant, ensuring the posterior indeed integrates to 1. In other words, it is calculated by evaluating the integral of the numerator
which, in practice, is approximated numerically.
Model-based Bayesian inference has recently been considered as a powerful methodology for estimating underlying parameters which produced the data observed, along with their associated uncertainties. This is the lower-level of inference, parameter estimation. A question that must be answered first, however, is “which model best predicts the data?” or put another way, “which model do the data prefer?” This higher level of inference, model selection, must be completed any time there is a set of competing models.
In the case of DOA estimation, as long as sound propagation pathways in the acoustic environment are well-understood, there usually exists a model which will accurately predict the microphone array response to any possible configuration of sound sources. However, it is not always known how many concurrent sound sources are in the environment, and since in many cases this determines the formulation of the model (not merely the parameter values), there is a set of competing models from which to choose prior to estimating sound source parameters.
A. Nested sampling
At the heart of model selection is calculating the evidence of each model for comparison. There are many methods for numerically approximating this area under the likelihood curve. In many cases, there may be a large number of dimensions to cover ( in the current problem) and the dimensionality may be different among models, making direct integration intractable. The chosen approach for approximating the integral for evidence shown in Eq. (11) is nested sampling, proposed by Skilling.37 This is a MCMC method of sampling which partitions the range of the Pdf similarly to Lebesgue integration. This allows the integral to be approximated via a two-dimensional mapping—the likelihood as a function of “prior mass,” μ. The prior mass, defined as
is the multidimensional piece (mass) of the prior whose associated likelihood, , is found to be above a value, λ, which is a variable that changes monotonically during sampling.
Thus the multiple integral for evidence is reduced to a single integral
where the limit μ = 1 represents the entire prior mass, which requires the least-stringent (lowest) limit, λ, and provides the lowest likelihood value, and the limit μ = 0 represents the infinitesimal region of the prior mass with parameters that maximize the likelihood [for the most-stringent (highest) λ]. In practice, the proposed implementation of nested sampling backward-integrates Eq. (13), accumulating likelihood values starting with the lowest (μ = 1), then continually increasing and ever-narrowing the range of parameters in the sampling population (to μ = 0). In other words,37
which corresponds to a sequence of likelihood values
where N is the total number of samples taken. Figure 3 illustrates one sequence sampled from the experimental data discussed in Sec. III. The likelihood from each sample is recorded as the parameters are varied under the constraint that each accepted parameter perturbation increases the likelihood value of the sample. As such, the curve grows monotonically and may be considered to be finished when subsequent increases are negligible and all parameter values have converged about the global maximum likelihood.
An example of sampling results for sources at 0° and 12° (as in Fig. 2). Natural log of the likelihood is plotted against the sample number. Likelihood values increase monotonically due to the constraint on acceptable parameter perturbations. Increases in likelihood diminish as the sample population converges on the most likely parameter values, i.e., Eq. (15). The area under the curve is estimated using Eqs. (17) and (16), which gives the evidence, Z, for this run-through.
An example of sampling results for sources at 0° and 12° (as in Fig. 2). Natural log of the likelihood is plotted against the sample number. Likelihood values increase monotonically due to the constraint on acceptable parameter perturbations. Increases in likelihood diminish as the sample population converges on the most likely parameter values, i.e., Eq. (15). The area under the curve is estimated using Eqs. (17) and (16), which gives the evidence, Z, for this run-through.
It can be difficult to visualize the sample populations convergence on a point in a multidimensional space with dimensions greater than three, so it can be helpful to plot just a couple dimensions at a time as in Fig. 4. Sample likelihood values increase monotonically with respect to sampling iteration; however, in just two dimensions they do not always increase monotonically with geometrical proximity to the maxima due to possible mismatches between the model and data in dimensions that are not shown. Figure 4 not only shows the central, converged population, but also previous samples which had lower likelihood values. By taking a weighted average of the final sample population over the space, the and parameters are estimated to be radians () and radians (), which are correct to within the precision of the experimental setup. It is important to note that results for the 2-sound-source model are shown here (which is known to be the correct model based on “ground truth”). Still in the model selection stage, this sampling step produces likelihood contour information, which must be combined with an estimate of the step size in μ in order to evaluate the integral in Eq. (13).
(Color online) Two dimensions of the multi-dimensional parameter space [incident angles 1 and 2 of the 2-source model, i.e., and in Eq. (4)] for the last 3000 samples in the case where sources are at 0 and 0.209 radians (0° and 12°). Natural logarithm of the likelihood for each sample is indicated by its shading. The area of highest likelihood indicates that the true incident angles are in actuality likely to be very close to 0.4° and 11.9°. Samples closest to this point generally have higher likelihood; however, in samples whose parameters along other dimensions than those shown are not properly converged, the likelihood of that sample is diminished.
(Color online) Two dimensions of the multi-dimensional parameter space [incident angles 1 and 2 of the 2-source model, i.e., and in Eq. (4)] for the last 3000 samples in the case where sources are at 0 and 0.209 radians (0° and 12°). Natural logarithm of the likelihood for each sample is indicated by its shading. The area of highest likelihood indicates that the true incident angles are in actuality likely to be very close to 0.4° and 11.9°. Samples closest to this point generally have higher likelihood; however, in samples whose parameters along other dimensions than those shown are not properly converged, the likelihood of that sample is diminished.
As the likelihood distribution is explored with a random walk, the prior mass tends to shrink exponentially depending on the size of the sampling population as37
where n is the sampling iteration and Q is the total population. Using this relation, one can estimate the amount of evidence encapsulated by each step of “nested” prior mass along the likelihood curve. The aggregate of these is the total evidence for the model in question
Since this is a stochastic process which depends heavily on the initial placement of the samples throughout the space, it is wise to repeat it to ensure a proper result. The initial population size and importance thereof is of great interest in qualifying nested sampling and the results it yields. This is further discussed in Sec. V.
As the population converges on maxima in the likelihood curve, the total amount of prior mass they collectively span becomes smaller and smaller, which diminishes the overall contribution to accumulated evidence. A central discussion regarding nested sampling is how to decide when sampling is complete. Popular convergence criteria include: changes in rate of increase in likelihood or evidence, and deviation of parameters from their estimated mean values. In this work, sampling is said to be complete when each parameter of every sample deviates from the samples' mean by no more than 1%.
B. Implementation
Nested sampling is implemented in this work as follows:
Choose a model and generate a population of samples with random parameter values according to the prior distribution.
Evaluate the population's individual likelihood values with Eq. (9) and sort them.
Record the lowest likelihood value.
Randomly perturb a parameter of the lowest-likelihood-sample until its likelihood exceeds the next-lowest value in the population.
Repeat steps (3) and (4) until all parameter values are within a pre-defined range indicating convergence on the global likelihood maximum (i.e., each sample's parameter values are within 1% of that parameter's sample mean).
Using the table of recorded likelihood values, calculate the estimated evidence according to Eqs. (16) and (17).
Repeat steps (1)–(6) using each model from the set of competing models multiple times each.
Plot the evidence results against model complexity and choose the best model using criteria employing Occam's razor (described in Sec. V).
III. EXPERIMENTAL METHODS
The coprime array consists of 16 omnidirectional electret microphones flush-mounted in a thin acrylic sheet. It can be subdivided into two colinear uniform arrays (“subarrays”) of M = 9 and N = 8 elements, each with aperture length 1 meter. This makes their inter-element spacings and , respectively (their undersampling factors are coprime). The coprime array design frequency (upper limit frequency) is Hz according to Eq. (1), which provides a broadside null-to-null beam width of 3.18° according to Eq. (1). The signals travel through ribbon cable to the data acquisition card and are processed using proprietary software developed at Rensselaer Polytechnic Institute. The sample rate for data collection is 62.5 kHz.
Impulse responses are acquired according to Ref. 25 with the method of Ref. 43 using a single bookshelf speaker via the logarithmic sine sweep method, producing a relatively flat power spectral density. Both the array and speaker rest on the hard floor to mitigate the effect of floor reflections. The sound studio dimensions are sufficiently large that when the system is set up in the center of the room the first room reflections arrive about 12 ms after the direct sound. Once the broadband impulse response is calculated from the data, room reflections can be suppressed by simple time-domain windowing to produce a simulated free-field response. The process is repeated for each different angle of incidence, manipulated by turning the array in 1° increments.
Having impulse responses (which are inherently broadband in nature due to this experimental setup) allows analysis of various isolated frequencies or frequency bands, or convolution with signals or noise. They may also be added together after convolution with incoherent signals to simulate multiple simultaneous sources. The multiple sound source experimental data for this work is all synthesized using this method.
IV. RESULTS
The DOA estimation and source enumeration algorithm uses the coprime array's 16-channel free-field impulse responses outlined in Sec. III, with the source at a specific prior-known incidence angle. These experimental impulse responses are convolved with white noise to provide longer streams of broadband data. For the 2-source trials, the data simulation employs two such measurements added together point-by-point and for 3-source trials there are three, and so on.
Once the data are compiled in such a way as to engage multiple simultaneous incoherent sources, the result is a matrix, where n is the number of data points (i.e., the sample rate in Hz multiplied by the overall time in s). This is then converted to a beampattern in Cartesian coordinates via simple delay-and-sum processing by first separating the uniform subarrays (creating an n × M matrix and an n × N matrix, where M and N are the number of coprime subarray elements), then producing a single stream from each subarray at each of the K = 181 delays (one for each degree increment from −90° to 90°).25 These final two n × K matrices are multiplied together and the root-mean-square of the result taken in the 1st dimension to provide beampattern values at each angle (), which can be normalized and plotted as in Fig. 2.
At this point, the phenomenological broadband coprime beampattern model from Eq. (4) can be evaluated at each corresponding θ value (−) using any values of the parameters, , within the corresponding limits of their prior Pdfs, . This provides a K-point comparison which is used to find a likelihood value via Eq. (9) for that specific parameter set, . Performing this comparison with Q values of the parameter vector , where Q = 100 is the size of the sampling population, completes steps 1–2 of the nested sampling algorithm as described in Sec. II B.
The evidence results are shown as a bar chart in order to compare models of differing number of presumptive sound sources side-by-side. The model selection results for a 2-source case with sources at 0° and 4° are shown in Fig. 5. These models are compared directly with the data via Eq. (4) in Fig. 6, using parameter values that are each dimension's weighted average over the corresponding final sampling population. In Fig. 6(b), the 2-source model parameters indicate correct DOA estimates, successfully separating two concurrent sources which are 4° apart. When over-parameterized models are applied to the same data [as in Figs. 6(c) and 6(d)], the parameter estimation forces additional targets into wrong places with a diminutive amplitude, which negatively impacts correctly-identified sources, reducing their amplitudes as well. The 16-channel coprime array with 1-meter aperture provides a 3.2° resolution at best (at its design frequency) per Eq. (2). The close proximity of estimated sources in over-parameterized models is an indication that the parameter estimation provides incorrect results even though the likelihood increases. The model match in these cases gains slight improvements even though the parameters estimated are inappropriate; this is because at these close proximities, the model ceases to accurately represent the coprime array response—it yields higher resolution than is possible according to Eq. (2).
Average log evidence among competing models for the 2-sources case (0° and 4° incidence). Twnety trials are run per model, each with a sampling population of Q = 100. In this case, the 2-source model correctly shows improvement over the 1-source model. The 3- and 4-source models show increases in evidence, though not enough to justify their higher complexity.
Average log evidence among competing models for the 2-sources case (0° and 4° incidence). Twnety trials are run per model, each with a sampling population of Q = 100. In this case, the 2-source model correctly shows improvement over the 1-source model. The 3- and 4-source models show increases in evidence, though not enough to justify their higher complexity.
(Color online) Comparison between the model using Bayesian-estimated parameters and the data. Each plot has a different model, corresponding to increasing number of presumed sound sources (thus increasing number of parameters): (a) 1-source model, (b) 2-source model, (c) 3-source model, (d) 4-source model. Estimated An and parameter values are shown for each.
(Color online) Comparison between the model using Bayesian-estimated parameters and the data. Each plot has a different model, corresponding to increasing number of presumed sound sources (thus increasing number of parameters): (a) 1-source model, (b) 2-source model, (c) 3-source model, (d) 4-source model. Estimated An and parameter values are shown for each.
Bayesian evidence results for a 3-source case with sound sources at −, and are shown in Fig. 7. The estimated parameters for each model are plotted against the data in Fig. 8. Here it is perhaps less clear that the 4-source model (d) is inappropriate since the spurious source incidence angle () is 5° from two correctly identified angles. This case exemplifies the need for an appropriate criterion for higher-order model acceptance based on sufficient gains in evidence, considering it is not easy to infer the correct number of sound sources by simple inspection. In Fig. 6(e), the fifth source angle is less than 1° away from a previously-identified and estimated source, which again lies within the resolution of the array allowed by Eq. (2).
The average of the log of the evidence is compared among competing models for the 3-sources case ( and incidence). Twenty trials were run per model, each with a sampling population of Q = 100. In this case, the 3-source model correctly shows improvement over the 1- and 2-source models. The 4 and 5-source models show increases in evidence, though not enough to justify their higher complexity.
The average of the log of the evidence is compared among competing models for the 3-sources case ( and incidence). Twenty trials were run per model, each with a sampling population of Q = 100. In this case, the 3-source model correctly shows improvement over the 1- and 2-source models. The 4 and 5-source models show increases in evidence, though not enough to justify their higher complexity.
(Color online) Comparison between the model using Bayesian-estimated parameters and the data. Each plot has a different model, corresponding to increasing number of presumed sound sources (thus increasing number of parameters). Estimated An and parameter values are shown.
(Color online) Comparison between the model using Bayesian-estimated parameters and the data. Each plot has a different model, corresponding to increasing number of presumed sound sources (thus increasing number of parameters). Estimated An and parameter values are shown.
V. DISCUSSION
One can be quantitatively confident in the results of the parameter estimation via the posterior distribution, as long as it has been sufficiently sampled. A common convergence criterion would be a negligible increase of the likelihood with significant continued sampling—shown in the zoomed-in portion of Fig. 3. Another possible criterion, employed in this work (step 5 in Sec. II B), is that all parameter values are within a certain predefined range of one another, which may be decided depending on the intended precision desired. In this case when all samples have parameter values within 1% of each other within their respective dimension, the criterion is met. The danger here is to possibly settle on a local maximum rather than the global maximum. One way to mitigate this risk is to increase the sample population in hopes of having at least one sample from the initial population close enough to the global maximum to force the other samples to tend its way. A detailed discussion on the effectiveness of deploying these initial populations can be found in Ref. 36. The starting population of Q = 100 proved to be sufficient for this work since there were no especially problematic local maxima found in the likelihood space.
Besides parameter estimation, sample population size has the potential to also be important for sufficiently estimating the integral of the marginal likelihood/evidence for Bayesian model selection. Nested sampling works by finding contours in the multidimensional likelihood space, starting with samples spanning the entire space (having a prior mass of 1) and then requiring each successive iteration have higher likelihood, honing in on the areas of higher likelihood (shrinking the prior mass gradually). Because of the constant decrease of the prior mass according to Eq. (16), the most significant contributions to the evidence estimate are made toward the start of the sampling, which suggests that the initial sample population may be much more important for model selection than for parameter estimation. When larger population sizes were tried as in Fig. 9 with Q = 1000, similar results were found in the parameter estimates as well as the evidence estimates; therefore it may not be appropriate to blame population size for ambiguities in model selection in this case.
The log evidences for each model evaluating 3-source data using a larger population size of Q = 1000 are relatively the same as with the smaller-population results (see Fig. 7). Two trials were run for each model. The parameters are estimated to be the same values as in the Q = 100 case.
The log evidences for each model evaluating 3-source data using a larger population size of Q = 1000 are relatively the same as with the smaller-population results (see Fig. 7). Two trials were run for each model. The parameters are estimated to be the same values as in the Q = 100 case.
Models are ranked using estimates of their Bayesian evidence; however, it is often observed that evidence values increase with increasing model complexity (i.e., number of parameters) even when the “ground truth” is known and the principle of parsimony (Occam's razor) requires fewer parameters. Occam's razor prefers the simpler model if more than one models compete each other or describe the same data (phenomena). At first glance, the evidence results illustrated in Figs. 5 and 7 convey that the “true model” seems to be identified at the “knee” or “knickpoint” when the evidence value increases significantly, after which point it increases much slower as the model complexity increases. These figures may suggest that nested sampling effectively explores the parameter space to provide a monotonically increasing sequence of likelihood values in a stochastic manner as expressed in Eq. (15). The corresponding shrinkage of the prior mass [as in Eq. (16)], however, was not obtained from the Markov chain Monte Carlo exploration, rather, approximately estimated under the assumption of exponential decrease in a deterministic manner. This assumption/approximation for the evidence estimation could be responsible for the evidence values to increase with increasing number of concurrent sound sources. Future research effort should go into potentially more-powerful sampling methods in order to explore the likelihood landscape more accurately.
Another possible culprit for increased evidence of over-parameterization may be limitations of the phenomenological model. The constant term in Eq. (4) does not capture course manifestations of the noise, and there may be residual spectrum-dependent sidelobes of the broadband coprime beampattern which a Laplace distribution function does not properly model for each sound source. Especially in cases where the sound source magnitude spectrum is not sufficiently constant (white) across the observed frequency range, one would expect spurious side lobes to arise from such inhomogeneity. Further effort should characterize coprime beampatterns including any residual sidelobes accurately for sound sources presenting arbitrary frequency content.
The ground truth in the current investigation/implementation provides a guideline to realistically evaluate the increase of evidence values when the model order is increasing one-by-one. The increase of the evidence value from second order to third order as shown in Fig. 5 is 22 Np on average, while the increase from the third order to fourth is 14 Np. In comparison with an evidence increase of 181 Np from the first order to the second order, those relatively small increases can be considered as insufficient. This investigation using known ground truth can be taken to derive a specific scale in order to penalize the higher order models. In similar fashion, Fig. 7 illustrates the evidence estimation for three concurrent sources, the knick-point is the evidence at model order 3. Here, the increase of the evidence value from second order to third order amounts to 223 Np, while the increases of 39 Np or 13 Np from the third order to the fourth or from the fourth to fifth, respectively, are considered as insufficient. A similar strategy of ranking models was also applied by Jeffreys in the 1960s.44 These consistent results by multiple (20) runs of nested sampling provide consistent evidence estimates with small variations. The present investigations suggest that about 50 Np can be considered as a sufficient scale for ranking the models under test. If an increase in evidence estimate is observed larger than this scale value, the higher-order model can be considered as justified.
Adjusting these model selection criteria on a problem-by-problem basis follows logically from the fact that the number of data points available may vary significantly between problems. This number of data points, K, in Eqs. (7) and (9) directly scales the log evidence values. This means that if a 50 Np increase criterion justifies increased model order in one problem, another larger criterion may be needed for a similar problem which uses ten times as many data points for example. Another consideration for this array problem is the resolution of the array. If no limits are placed on the source width of the model, one sound source may become two lower-amplitude, yet closer-together sources when sampling in the higher-dimension model likelihood space, even if the physical limitations of the array preclude such a response. That would be an example of nominally higher evidence for an incorrect, higher-order model. Due to many factors, model selection criteria may require renewed individual attention for each application.
VI. CONCLUDING REMARKS
Two levels of Bayesian inference were applied to coprime sensor array processing which represents the first such effort. This paper has proposed a phenomenological model for broadband coprime microphone array responses to broadband noise sources. This model has proved sufficient in approximating the experimental data when the model order has been identified and the correct parameters have been estimated. Bayesian inference using nested sampling provides a useful framework for model selection, though problem-specific ambiguities may still arise. Since there are cases where unfounded increases in model order produce nominal increases in evidence, these evidence values alone are insufficient—there is still a need for additional criteria in punishing model complexity. Jeffreys put forth in his application that an increase of 10 decibans may be required to justify the next model order;44,45 however, problem-specific factors (such as the number of data points, K, or physical limitations in resolution) may have the effect of scaling this criterion. For this reason, the “ground truth” in one's experiment can be used to determine the appropriate decision-making criteria for the specific model selection problem at hand. For the 181-point broadband 9 × 8 coprime array output problem with equal-amplitude, uncorrelated, broadband noise sources, the criterion chosen is 200 decibans for Bayesian model selection in this DOA application with finite angular resolution. Ground truth was used as the standard with which to validate the above methods. Future study might include systematic comparisons between this source enumeration and DOA estimation method and other established algorithms—this is beyond the scope of the current work.
An advantage of using nested sampling for model selection is that as long as the sampling data are saved, the parameter estimation is completed (or at least well pursued) as part of the same process. Accuracy of nested sampling's estimation is still a subject of apprehension due to assumptions about the shrinkage of the prior mass in each sampling iteration; these could have varying effects on the evidence estimate. Model fidelity is undoubtedly of high importance in any case—that is, neither model selection nor parameter estimation will work properly if the proposed models fail to sufficiently approximate the real-world data. Future work should explore analytical models for spectrum-dependent broadband coprime array responses, which can be evaluated by the Bayesian framework with the hope that the aforementioned ambiguities are better-resolved.