Bayesian selection of plane-wave decomposition models

: Plane-wave decompositions, whereby a measured sound ﬁeld is described as a superposition of plane waves, are central to many applications in acoustics and audio engineering. This letter applies a Bayesian probabilistic inference framework to the plane wave decomposition problem and examines the Deviance Information Criterion (DIC) for selecting the optimum number of waves in the decomposition. The framework learns the model directly from the data and, as such, adapts to the waveﬁeld under study. The DIC is applied to data measured in two reverberant sound ﬁelds (highly-reverberant and lightly-damped) to determine the simplest models providing the preferred ﬁt to the data.


Introduction
Elementary wave expansions are essential to many applications in sound field analysis, reconstruction, and reproduction.The idea is to expand sound pressure data measured at a number of observation points (typically, with an array of microphones) into a sum of elementary waves, and use this elementary wave basis to analyze, predict, or reproduce the sound field within a specific domain.The choice of a particular basis (plane or spherical waves, spherical or cylindrical harmonics) depends on multiple factors, including the measurement apparatus, the intended application, and the nature and physics of the problem which is being solved.
In far-field measurements, where the wavefronts can be regarded as locally planar, the sound field can simply be described as a superposition of plane propagating waves.In this case, the task is to infer the complex amplitudes of the plane waves from measurements of the pressure field at a limited number of observation points.This results in a linear inverse problem, which is typically ill-posed (in this case underdetermined and/or rank deficient).Although plane wave decomposition models have been used extensively in the acoustical literature, the question of the optimum number of plane waves that best describes the measured pressure has rarely been addressed and the standard approach thus far has been to choose the set of plane waves on a fairly ad hoc basis.Vekua's theory (Moiola et al., 2011;Vekua, 1967) gives the minimum number of plane waves required to properly approximate a measured pressure field, provided that the directions of propagation of these plane waves are uniformly distributed across all propagation angles.The measure, however, depends exclusively on the wavenumber and the radius of the measurement aperture and ignores the structure of the measured sound field (which may be due to multiple interfering waves or just a few).
The present work uses a probabilistic Bayesian framework to inherently account for the sound field's structure by allowing the data themselves to determine which combination of plane waves gives the preferred fit to the measured pressures.This is the problem of Bayesian model selection, which arises across many branches of science and acoustics (Xiang, 2020) and ensures the elimination of parameters that play an insufficient role in improving the fit to the data while capturing the important features evident in the data.Probabilistic frameworks are not commonly applied to the problem of plane wave expansion.This is due, at least in part, to the high-dimensionality of the problem (typically on the order of hundreds of plane waves), which becomes computationally intractable even for moderate dimensions.The only study that reports a fully parameterized Bayesian estimation of the amplitudes of a plane wave expansion model is due to Schmid et al. (2021), for the purpose of sound field reconstruction in enclosures.The study, however, is limited to a preselected set of 50 plane waves in two dimensions.
This study applies the Deviance Information Criterion (DIC) (Spiegelhalter et al., 2002;Steininger et al., 2014) to the problem of plane wave expansion.The DIC is an approximate model selection method that balances model accuracy against complexity.It is particularly well-suited for high-dimensional problems, for which the implementation of a full Bayesian approach to model selection may be too computationally demanding.The key aim is to make an objective comparison of different models featuring an increasing number of plane waves in order to estimate the number of plane waves providing the preferred fit to the data.a) Author to whom correspondence should be addressed.
The article is organized as follows: Sec. 2 presents the theoretical background, and Sec. 3 evaluates the suitability of the DIC for selecting plane wave decomposition models, based on experimental data measured in two reverberant sound fields: highly-reverberant and lightly-damped.

Theory
This section describes the plane wave expansion problem to which two levels of Bayesian inference are applicable: model selection and parameter estimation.We further introduce the DIC as a plane wave model selection method (higher level of inference) and derive the maximum a posteriori (MAP) estimate, required for the calculation of the DIC and the parameter estimation (lower level of inference).

Plane-wave decomposition model
The sound pressure p measured at each of the M positions r m of an array of microphones can be described as a sum of n plane waves, which directions of propagation are uniformly distributed over a spherical grid where Pðk w Þ is the complex amplitude of the plane wave propagating in the direction given by the wavenumber k w .The harmonic time dependence e jxt is suppressed for simplicity.The pressure field, sampled at the M positions of the sensor array, can be expressed with the linear model where p 2 C M is the complex-valued data vector from the measurements at the M sensors, W n 2 C MÂn is the transfer matrix containing the n plane wave functions e Àjkw:r T m , and x 2 C n is the unknown vector of the complex plane wave amplitudes at all n directions on the angular grid.h n ðxÞ 2 C M is here defined as a prediction model for the data p and encapsulates the model parameters x.Equation (2) represents an inferential inversion of x, which can be achieved within a probabilistic Bayesian framework.The next two sections introduce the two levels of Bayesian inference: model selection (Sec.2.2) and parameter estimation (Sec.2.3).

Model selection (higher level of inference)
Model selection refers to the process of selecting which of a finite set of N competing models h n (associated with different numbers n of plane waves) is preferred by the data.Concisely, the probability of the model h n given the data p and background information I is given by Bayes theorem pðh n jp; IÞ ¼ pðpjh n ; IÞpðh n jIÞ pðpjIÞ ; (3) where pðpjh n ; IÞ and pðh n jIÞ represent the marginal likelihood and the prior probability of the model, respectively, and the quantity pðpjIÞ plays the role of a normalization constant.The background information I states that each model h n may predict the data reasonably well.As such, Eq. ( 3) represents how our prior knowledge about the model is updated in the presence of experimental data, through the marginal likelihood pðpjh n ; IÞ, also termed Bayesian evidence.The Bayesian evidence is crucial to model selection, as we further elaborate.Based on Eq. ( 3), model comparison between two different models h n and h n 0 evaluates the so-called Bayes factor, which gives the posterior odds of one model against another.Presuming that the models are equally favoured prior to the data fitting, the Bayes factor reduces to the ratio of evidence which, for computational convenience, can be determined using a base 10 logarithmic scale (in bans) Among a finite set of N competing models, the highest positive Bayes factor L nn 0 implies that the data prefers model h n over h n 0 the most.The Bayesian evidence can be combined with prior probabilities for different models if desired, but even if the prior probabilities are assumed equal, the evidence still automatically encodes a preference for simpler models, implementing Occam's razor in a quantitative manner.

ARTICLE
where the data likelihood pðpjx; h n ; IÞ should be assigned based on a priori knowledge about the squared residual errors 2 m between the measured data and the model prediction, This includes that the model has been chosen to predict the measured data sufficiently well, which implies that the mean error of the individual data points should be around 0, while the variance in error values must be finite.Applying the principle of maximum entropy (Gregory, 2006;Jaynes, 1968) given these constraints, the data likelihood can be assigned a complex Gaussian distribution with independent and identically distributed (iid) real and imaginary parts (Jaynes, 1968;Xiang, 2020).
Similarly, the prior probability should be assigned depending on a priori knowledge on the structure of the sound field (being composed of multiple wavefronts, or just a few).Assigning an iid multivariate Gaussian to the prior probability, Eq. ( 8) is equivalent to the ' 2 -norm regularized least squares problem where k is the regularization parameter (Xenaki et al., 2016).Conveniently, Eq. ( 10) has the closed-form analytical solution where I is the identity matrix and the superscript H denotes the Hermitian transpose.Equation ( 11) corresponds to the least squares solution with Tikhonov regularization (Nolan et al., 2019), which will be used in Sec. 3 to evaluate the MAP (given that they are equivalent).Note that Eqs. ( 10) and ( 11) result from assigning a Gaussian prior to the likelihood and model parameters, based on a priori knowledge about the error function and the sound field's structure (in this case, being composed of multiple wavefronts incoming from various directions, as in the data set presented in Sec. 3).If little is known about the structure of the measured sound field, a uniform prior probability may be assigned instead, encoding a lack of specific prior knowledge.

Implementation using the DIC
When comparing different models (here interpreted as varying the number of plane waves in the expansion), the gold standard would be to evaluate the Bayesian evidence pðpjh n ; IÞ for a given model h n .However, estimating the evidence is computationally demanding (if not intractable) for high-dimensional model selection applications (which is the case in the present study).Instead, the DIC introduced by Spiegelhalter et al. (2002) is often considered to approximate the Bayesian evidence and has successfully been applied in several contexts (Gelman et al., 2014;Steininger et al., 2014).The inverse deviance information criterion (IDIC) is defined as where DðxÞ ¼ 2 log 10 ½pðpjx; h n ; IÞ is the Bayesian deviance and p D is the effective number of parameters estimated as p D ¼ DðxÞ À DðxÞ, where x is a central vector in the parameter space (e.g., the mean, median, mode, or maximum a posteriori estimate) and DðxÞ is the mean of the deviance.In accordance with the Bayesian evidence (Xiang, 2020), this paper prefers the IDIC, which differs from that in Spiegelhalter et al. (2002) by a negative sign.The first term in Eq. ( 12) represents the goodness of fit between the model and the data, while the second term represents a penalty on overparameterized models, offsetting any improvement in the maximum likelihood that the extra parameters might allow.The DIC (or IDIC) is similar to other more common model-selection criteria such as Akaike's criterion (AIC) or the Bayesian information criterion (BIC) (Dettmer et al., 2010;Schwarz, 1978;Xiang et al., 2011).Yet, unlike the AIC or the BIC, the DIC accounts for the general non-Gaussianity of the posterior probability (Steininger et al., 2014).Besides, while the AIC and BIC penalize the fit of a model by the actual number of parameters involved in the model, the DIC is distinguished by the use of an effective number of parameters, calculated from samples drawn from the posterior probability of the model parameters.As such, the DIC has the advantage that it accounts for possible parameter correlations, making it more suitable for high-dimensional model selection.See supplementary material for a comparison between the IDIC, the IAIC and the IBIC for the data set used in Sec. 3. 1 The results show that the use of the IAIC or the IBIC in this study would result in a too large value associated with the penalty term, masking the goodness of the fit between the model and the data.
In the following, the difference in IDIC will be used as an approximation to determine the Bayes factor in Eq. ( 6).The central vector x in Eq. ( 12) is taken to be the MAP estimate, which is derived in Sec.2.3.The deviance's mean DðxÞ is calculated from samples drawn from the posterior probability pðxjp; h n ; IÞ of the model parameters x within a vicinity of the MAP.Details regarding the sampling of the posterior probability are given in Sec. 3.

Results
The suitability of the IDIC for selecting the optimum number of plane waves in a plane wave expansion problem is examined based on sound pressure data published in Nolan et al. (2019).The data set consists of 290 data points measured in a large reverberation chamber in two damping conditions: undamped (highly reverberant, Schroeder frequency f s % 400 Hz) and with a sample of absorbing material on the floor (lightly damped, f s % 250 Hz).The chamber is essentially box-shaped, with outer dimensions 7.8 m Â 6.2 m Â 4.9 m.There are however 85 built-in concrete boundary diffusers, which reduces its volume to approximately 215 m 3 .A scanning robot UR5 (Universal Robots, Odense, Denmark) is programmed to move a free-field microphone (Br€ uel & Kjaer, Naerum, Denmark) within a rectangular volume located in the far-field of a loudspeaker (distance > 5 m), creating a sequential array of 290 measurement positions (the array size is 0.6 Â 0.8 Â 0.4 m 3 ).The measured sound pressure is expressed as a superposition of n plane waves of unknown complex amplitudes, whose directions of propagation are distributed uniformly across all propagation angles following a Thomson model (Thomson, 1904).
For a given frequency, a set of N competing models h n associated with an increasing number n of plane waves is considered, where n ranges from 34 to 1474 in steps of 120 plane waves.The IDIC for each model is calculated according to Eq. ( 12) and used to classify the set of models.For simplicity, the MAP estimate for each model h n is calculated using Eq. ( 11) and the L-curve criterion as a parameter-choice method for the Tikhonov regularization (Hansen, 1992).As stated before, the deviance's mean in Eq. ( 12) is calculated from samples drawn from the posterior probability of the model parameters x within the vicinity of the MAP.Specifically, the samples are drawn based on an iterative procedure where, at each iteration, a new sample x i is generated by evolving an existing sample x iÀ1 from a normal probability distribution (i.e., each complex plane wave amplitude in the vector x is iteratively evolved from a normal probability distribution, xMAP being the initial amplitude vector x 0 ).Each new sample that is being generated yields a new likelihood value, resulting in a decreasing sequence of likelihood values (as we move away from the MAP), the mean of which is used to calculate the IDIC.Note that, as the variance is often considered a nuisance parameter (Bretthorst, 1988;Xiang, 2020), a student t-probability distribution is assigned to the likelihood, corresponding to a marginalized version of the Gaussian distribution in Sec.2.3, where the variance has been marginalized.To ensure adequate coverage of the parameter space, a population of 100 000 samples is generated for each model, and a normal probability distribution with variance 1Â10 À5 Pa is used for the iterative procedure (this corresponds to a large variance, given that the MAP amplitudes are on the order of 1Â10 À6 Pa).In the following, five independent sequences of 100 000 samples are generated for each plane wave model and damping condition.As such, the IDIC and individual terms presented in Fig. 1(a) are averaged over the five sequences (i.e., they are the mean of the IDICs and individual terms calculated from each sequence).394 plane waves, respectively, are therefore discarded for the model comparison in Fig. 1(b), as they exhibit too low IDIC values, indicating poor goodness of fit between the model and the data [this is also reflected in the deviance's values and corresponds to the shaded area in Fig. 1(a)].From 514 plane waves, no significant increase can be observed, indicating that any of the models associated with at least 514 plane waves may be equally good at predicting the data.Figure 1(b) shows the Bayes factor calculated according to Eq. ( 6) from the difference in IDIC between models.In the lightly-damped room, the model exhibiting the highest positive Bayes factor is associated with 754 planes.In the highly-reverberant room, however, the data seems to prefer the model associated with 994 plane waves.This is in line with the intuition that sound fields of increased complexity should be modeled with an increased number of plane waves.Interestingly, at 1 kHz, Vekua's theory predicts a minimum number of 64 plane waves for both damping conditions.The present analysis and results suggest that the optimum number of plane waves is way larger than the minimum theoretical prediction and depends on the specific sound field under study.Note, however, that the values of 754 and 994 plane waves are given as indicative values of what would be a sensible choice for the number of plane waves.They should not be taken as clear-cut values, as they strongly depend on the choice of models used in the comparison.This is however not a matter of concern as we neither believe in a "true" number of plane waves (but rather a range of sensible values) nor would we expect the number of plane waves being considered to remain static as the complexity of the sound field varies.
Once a suitable model has been selected, the posterior probability samples generated for the calculation of the IDIC can also be used to estimate the mean value and standard deviation of each model parameter (i.e., each plane wave amplitude) in the plane wave expansion.Figure 2 shows the mean over 100 000 samples of the complex amplitudes magnitude at 1 kHz for the lightly-damped [Fig.2(a)] and highly-reverberant [Fig.2(c)] rooms, respectively.The regularized estimates are also displayed for comparison [Figs. 2(b) and 2(d), respectively].In addition, Table 1 shows the mean and standard deviation over the 100 000 samples of the complex amplitudes of three randomly selected plane waves in the highly-reverberant room.Such uncertainty estimates cannot be obtained from standard regularized solutions.There is reasonable agreement between the regularized solution and the mean obtained from the posterior probability samples for both damping conditions.It can be seen that the highly-reverberant room conforms to a more complex wave field, with multiple waves incoming from all directions.In the lightly-damped room, fewer directions of propagation are identified.In particular, no waves are found to propagate in the positive z-direction, as very little energy is being reflected by the absorbing sample (a % 1 at 1 kHz).

Conclusion
This work has introduced the IDIC as an efficient means of selecting plane wave decomposition models.Such criterion is particularly adequate for high-dimensional model selection applications, for which a fully parameterized Bayesian estimation is too computationally demanding.The suitability of the IDIC for selecting the optimum number of plane waves in a plane wave decomposition problem was examined based on sound pressure data measured in two reverberant fields: highly-reverberant and lightly-damped.The results suggest that the optimum number of plane wave depends on the sound field under study.This, in turn, confirms the suitability of a Bayesian model selection framework that learns the plane wave decomposition model directly from the data.Further work includes a fully parameterized Bayesian estimation of the amplitudes of a plane wave expansion model, solving for the two levels of inference (model selection and parameter estimation) and investigates how frequency and frequency averaging influences the number of plane waves to be considered in the decomposition.Table 1.Mean and standard deviation over 100 000 samples of the complex amplitudes of three randomly selected plane waves (x 1 to x 3 ) in the highly-reverberant room.Frequency: 1 kHz.

Fig. 1 .
Fig. 1.Bayesian analysis of sound pressure data measured in a reverberation chamber in two damping conditions.(a) IDIC and individual terms; (b) Bayes factor estimated using various plane wave decomposition models.754 and 994 plane waves obtain the highest positive Bayes factors in the lightly-damped and highly-reverberant sound fields, respectively.Frequency: 1 kHz.
Schmid et al. (2021)meters x (conditioned on the data p, the model h n associated with a number n of plane waves, and background information I) with the data likelihood pðpjx; h n ; IÞ, the prior probability of the model parameters pðxjh n ; IÞ, and the marginal likelihood of the data pðpjh n ; IÞ.Equation (7) represents how our initial knowledge about the model parameters x, encoded in the prior probability pðxjh n ; IÞ, is updated once experimental data are incorporated into the analysis through the likelihood function pðpjx; h n ; IÞ.Given the dimensionality of the problem at hand (n is on the order of hundreds of plane waves), sampling the full posterior probability pðxjp; h n ; IÞ using traditional sampling methods (such as Markov chain Monte Carlo or nested sampling) becomes computationally intractable even for moderate dimensions.Schmid et al. (2021)did solve Eq. (2) in a probabilistic Bayesian framework, but limited their analysis to a 50 plane wave basis in a two-dimensional domain.This study estimates the MAP, which maximizes the posterior probability of x given the data.Omitting the Bayesian evidence pðpjh n ; IÞ in Eq. (7), the MAP is given by xMAP ¼ arg max asa.scitation.org/journal/jelJASA Express Lett. 3 (3), 031601 (2023) 3, 031601-2 2.3 Parameter estimation (lower level of inference) Once a suitable model has been selected, Bayes theorem, pðxjp; h n ; IÞ ¼ pðpjx; h n ; IÞ pðxjh n ; IÞ pðpjh n ; IÞ ; (7) relates the posterior probability pðxjp; h n ; x log 10 pðxjp; h n ; IÞ ½ ¼arg min x Àlog 10 pðpjx; h n ; IÞ ½ À log 10 pðxjh n ; IÞ ½ À Á ;