The significant amount of variance in head-related transfer functions (HRTFs) resulting from source location and subject dependencies have led researchers to use principal components analysis (PCA) to approximate HRTFs with a small set of basis functions. PCA minimizes a mean-square error, and consequently may spend modeling effort on perceptually irrelevant properties. To investigate the extent of this effect, PCA performance was studied before and after removal of perceptually irrelevant variance. The results indicate that from the sixth PCA component onward, a substantial amount of perceptually irrelevant variance is being accounted for.
I. Introduction
The most common method to simulate one or more virtual sound sources and the associated acoustic pathway from each source to both eardrums comprises a linear, time-invariant system modeling approach. In this method, the acoustic pathway is described by means of an impulse response, referred to as head-related impulse response (HRIR) or binaural room impulse response, for anechoic or echoic conditions, respectively.1,2 The number of HRIR filter coefficients that is required to synthesize a virtual sound source as a function of subject, azimuth, elevation, and distance can be substantial, as it is often argued that a head-related transfer functions (HRTF) database resolution of ∼5° is required to avoid spatial aliasing.3,4 As a consequence, several studies have addressed methods to reduce the amount of information in an HRTF database. One such method is to decompose HRTF magnitude spectra by means of a Karhunen–Loève transform [also referred to as principal component analysis, (PCA)] to result in a small set of basis functions and position-dependent weights.5–9 These studies show that typically 90% or more of the HRTF database variance can be accounted for by five basis functions with subject and position-dependent weights, whereas ∼10–20 basis functions seem to be required for accurate psycho-physical localization performance.8 Further, results from Hwang and Park10 suggest that PCA operates less efficiently in the time (HRIR) domain than in the frequency (HRTF) domain.
A potential problem of the PCA method is that it minimizes the mean-square error of the approximated HRTFs, even if certain errors that are important in a least mean-square sense are perceptually irrelevant.11 The absence of a strong correlation between the mean-square error and psycho-physical detection performance is shown by Scarpaci and Colburn.9 It is therefore of interest to evaluate the influence of perceptually irrelevant variance on PCA. If the amount of irrelevant variance that is accounted for by PCA is substantial, the data reduction achieved by PCA is likely to be suboptimal, and improved performance can be expected if PCA is forced not to put effort in modeling such irrelevant variance. The goal of this study is therefore to investigate the performance of PCA if irrelevant information is removed from HRTFs.
II. Method
PCA models datasets by describing (or reconstructing) the data as a linear combination of a set of orthonormal basis functions (or components). A low-dimensional approximation of the dataset can be obtained by truncating the number of components. PCA minimizes the mean-square error between original and reconstructed dataset, or said differently, maximizes the amount of variance that can be accounted for. The amount or proportion of variance accounted for by one or more components can be computed by an eigenvalue decomposition of the covariance matrix of the data. The reader is referred to other publications10,12 for more details on this method. To study the influence of variance that is relevant in a least mean-square sense but is irrelevant from a perceptual point of view, the proportion of variance accounted for by PCA is computed before and after removal of irrelevant information from HRTFs. Only the magnitude part of HRTF spectra is considered, and (inter-aural) time and/or phase differences are assumed to be treated separately.
The perceptually motivated HRTF parameterization method described by Breebaart et al.13 was used to reduce the perceptually irrelevant variance in HRTF magnitude spectra. The underlying assumption of this method is that the sound source localization cues accessible to the auditory system are averaged in critical bands. In mathematical terms, the parameterization method can be described by a projection of an HRTF magnitude vector with index i referring to the sound source location, subject and ear of the associated HRTF,
with a critical-band parameter vector of length B = 41, and matrix M comprising the averaging weights to compute the parameter values. The reconstructed (referred to as “processed”) HRTF magnitude vector is then obtained by
with M+ the Moore–Penrose pseudo-inverse of M.
Breebaart et al.13 demonstrated that one spectral magnitude parameter for each critical band, in combination with triangularly shaped, 50% overlapping parameter analysis and interpolation does not result in audible differences between the original and reconstructed HRTFs. In the remainder of this paper, we therefore use this configuration for the removal of perceptually irrelevant variance.
The Center for Image Processing and Integrated Computing HRIR database that is publicly available from the University of California14 was used to evaluate PCA performance. A total of 110 000 HRIRs present in the database were converted to the frequency domain using a discrete Fourier transform. To study the effect of the domain PCA is operating on, PCA was applied on the linear magnitude spectrum as well as on the logarithmic spectrum.
III. Results
A. Proportion of variance
The proportion of variance not accounted for as a function of the number of PCA components is shown in Fig. 1. The left-hand panel represents PCA operating on linear magnitude spectra; the right-hand panel corresponds to PCA operating on a logarithmic spectrum representation. The crosses represent the original HRTF spectra, whereas the circles denote the performance of the processed HRTFs (e.g., after irrelevancy removal). For all conditions tested, the proportion of variance that is not accounted for decreases with the number of components. Irrespective of some minor deviations, for all four conditions a proportion of variance accounted for of ∼90% is achieved for five components, which is in line with results from other studies.8,9
Unexplained proportion of variance as a function of the number of PCA components. The left-hand panel represents PCA operating in the linear domain; the right-hand panel corresponds to PCA in the logarithmic domain. The crosses represent the performance for the original HRTF spectra; the circles represent the HRTF spectra after irrelevancy removal.
Unexplained proportion of variance as a function of the number of PCA components. The left-hand panel represents PCA operating in the linear domain; the right-hand panel corresponds to PCA in the logarithmic domain. The crosses represent the performance for the original HRTF spectra; the circles represent the HRTF spectra after irrelevancy removal.
The differences between the original and processed HRTF spectra become articulated for more than five components. In that case, both the linear and logarithmic HRTF spectra of the processed HRTF database show a stronger convergence and smaller residual variance than for the original HRTFs. This difference is stronger for PCA in the linear domain compared to the logarithmic domain. In the linear domain, 30 components account for 99% and 99.99% of the variance, for the original and processed HRTFs, respectively. In the logarithmic domain, these proportions of variance amount to 98% and 99.7%. The observation that PCA in the linear domain converges faster than in the logarithmic domain is also in line with results reported by Leung and Carlile.8
B. Root mean-square error
The proportion of variance that is (un)explained for a given number of components does not reveal information on the magnitude of the errors, such as the root mean-square error (RMSE). Therefore, RMSEs were calculated in the logarithmic domain in the same manner as employed by Scarpaci and Colburn.9 The PCA analysis and synthesis was performed in the linear and in the logarithmic domain. The results are shown in Fig. 2. The left-hand panel represents the RMSE for PCA applied in the linear magnitude spectrum domain; the right-hand panel denotes the RMSE for PCA in the logarithmic domain.
HRTF root mean-square error (RMSE) as function of the number of PCA components. The left-hand panel represents PCA operating in the linear domain; the right-hand panel corresponds to PCA in the logarithmic domain. The crosses represent the performance for the original HRTF spectra; the circles represent the HRTF spectra after irrelevancy removal. The RMSE was calculated in the logarithmic (dB) domain.
HRTF root mean-square error (RMSE) as function of the number of PCA components. The left-hand panel represents PCA operating in the linear domain; the right-hand panel corresponds to PCA in the logarithmic domain. The crosses represent the performance for the original HRTF spectra; the circles represent the HRTF spectra after irrelevancy removal. The RMSE was calculated in the logarithmic (dB) domain.
In line with the results for unexplained variance, the processed magnitude spectra have a smaller RMSE than the original magnitude spectra, indicating a benefit from irrelevancy removal. This benefit amounts to 1 and 2 dB, for 20 components and PCA applied in the linear and logarithmic domain, respectively. In contrast to the results for the unexplained proportion of variance, PCA in the linear domain results in significantly larger errors expressed in decibels compared to PCA applied in the logarithmic domain. This is likely to be caused by the fact that both PCA and the RMSE operate in the same (logarithmic) domain, as opposed to the case in which PCA operates in the linear domain and errors are formulated in the logarithmic domain.
IV. Discussion
When PCA is applied to linear HRTF magnitude spectra, the results obtained in this study are very much in line with those published earlier in that ∼90% of the HRTF variance can be explained by the first five PCA components.5,7–9 Further, the observed slower convergence of PCA applied in the logarithmic domain compared to the linear domain is also in line with data obtained by Leung and Carlile.8 When comparing the proportion of unexplained variance between the original and processed HRTF spectra for up to five PCA components, only small differences are observed. This result suggests that PCA applied to HRTFs with or without perceptual irrelevancies removed is practically equivalent for the first five components. An important question is therefore whether five components are sufficient for perceptual transparency. Kistler and Wightman7 stated that “Subjects' judgments of the apparent directions of headphone-presented sounds that had been synthesized from the modeled HRTFs were nearly identical to their judgments of sounds synthesized from measured HRTFs.” More recent studies indicate however, that localization performance continues to improve when the number of components is increased from 5 to 10 or 20.8,9 The RMSEs reported in Fig. 2 are also significant for just five components, especially if one assumes that the just-noticeable difference for spectral changes in HRTFs can be less than 1 dB in critical conditions.15 When more than five PCA components are employed, the benefits of perceptual irrelevancy removal in HRTFs become most apparent. For 20 PCA components, the proportion of unexplained variance is almost 1 order of magnitude smaller when irrelevancy is removed prior to PCA analysis. Further, when PCA operates in the logarithmic domain, the RMSE is reduced by up to 50% when perceptual irrelevancy is removed.
The presence of variance that is perceptually irrelevant is not the only caveat for PCA applied to HRTF spectra. The RMSEs shown in Fig. 2 also indicate the importance of the domain in which the variance or errors are evaluated. The results obtained for the proportion of variance accounted for (Fig. 1) seem to suggest that PCA operating in the linear magnitude domain is more efficient than in the logarithmic domain. On the contrary, if the RMSE is evaluated in the logarithmic domain, PCA operating in that same domain results in significantly smaller RMSEs than PCA operating in the linear magnitude domain. These observations stress the importance of a well-chosen optimization criterion that is indicative of perceptual quality. An accurate model of perceptual differences given a modified HRTF would be very helpful and possibly even a necessity to optimize the performance of PCA, or any other HRTF data reduction method for its ability to efficiently model HRTF spectra.
Although the results presented in this paper suggest significant benefits of perceptual irrelevancy removal in terms of PCA data compaction, there exists a risk that the changes imposed by irrelevancy removal and PCA individually do not result in perceptual differences, whereas the combined effect may, in fact, be audible. The existence and/or severity of such combined effect can only be demonstrated through psycho-physical validation or by means of an accurate perceptual model.
V. Conclusions
Principal component analysis is regularly used as a means for data compaction or efficient approximation of HRTF magnitude spectra, despite the risk of spending modeling effort on variance that is perceptually irrelevant. By comparison of PCA performance before and after perceptual irrelevancy removal, it can be concluded that the first five PCA components seem to capture predominantly perceptually relevant properties of HRTFs, whereas subsequent components are less effective from a perceptual viewpoint. Further, for PCA to be most effective, it seems important that the PCA optimization criterion (or residual error that is minimized) is formulated in a domain that correlates well with perceptual quality.