Numerical simulations of head-related transfer functions (HRTFs) conventionally assume a rigid boundary condition for the pinna. The human pinna, however, is an elastic deformable body that can vibrate due to incident acoustic waves. This work investigates how sound-induced vibrations of the pinna can affect simulated HRTF magnitudes. The work will motivate the research question by measuring the sound-induced vibrational patterns of an artificial pinna with a high-speed holographic interferometric system. Then, finite element simulations are used to determine HRTFs for a tabletop model of the B&K 5128 head and torso simulator for a number of directions. Two scenarios are explored: one where the pinna is modeled as perfectly rigid, and another where the pinna is modeled as linear elastic with material properties close to that of auricular cartilage. The findings suggest that pinna vibrations have negligible effects on HRTF magnitudes up to 5 kHz. The same conclusion, albeit with less certainty, is drawn for higher frequencies. Finally, the importance of the elastic domain's material properties is emphasized and possible implications for validation studies on dummy heads 1as well as the limitations of the present work are discussed in detail.

## I. INTRODUCTION

Human sound perception is inherently spatial^{1} and relies on acoustical cues, which can be mathematically described/encoded as head-related transfer functions (HRTFs). Such geometric-based audio filters are highly individualized and thus expensive to acquire accurately (see, e.g., Ref. 2). The two primary approaches for obtaining reference HRTF estimations are either to measure them experimentally or to simulate them using numerical simulation techniques.^{3–6}

Numerical simulations within virtual acoustics, including simulations of HRTFs, generally employ wave-based models, which are believed to describe linear acoustics reasonably well.^{7} Most often, an impedance termination of local reaction is used on the domain boundary: this approximation was introduced about a century ago to study the sound decay and modes inside a room^{8,9} and presents an abundance of issues when more detailed acoustical predictions are needed.^{10–12} Moreover, numerical simulations of HRTFs commonly employ fully rigid boundaries:^{13–15} an unphysical assumption that does not even correspond to the skin impedance measured for anatomical structures way simpler than the human pinnae.^{16,17} For example, the effects of considering a non-rigid head material with a finite specific acoustic impedance, to account for human hair, has been previously studied.^{18} The findings support the hypothesis that more realistic material properties could improve the accuracy of simulated HRTF data.

Although some efforts have been made to improve the modeling approaches that use the local reaction assumption (e.g., Ref. 19), the boundary impedance model itself is, in general, an approximation, valid in limited scenarios. (For instance, local reaction is expected to be valid for highly dense solids and highly porous materials.^{20}) It is currently unclear how such modeling assumptions (i.e., the rigid and locally reacting boundary conditions) affect the simulated HRTFs. Errors associated with these modeling approaches can contribute to the total validation error of the simulated HRTFs.^{21}

This work aims to improve the accuracy of HRTF simulations by modeling the pinna as an elastic, deformable body that interacts with acoustic pressure waves in air. The objective is to evaluate the validity of using a rigid boundary condition for the pinna region and consequently improve the accuracy of physics-based HRTF simulations to enhance future validation studies.

To motivate the research question, we, first, experimentally characterize the sound-induced vibrations of an average anthropometric pinna replica (KB5000; GRAS Sound & Vibration, Holte, Denmark) using a customized high-speed holographic interferometric system.^{22} We show that, for a genus 0 surface such as the human pinna, the sound-induced pinna vibrations are measurable and larger than 0, in contrast to the rigid boundary assumption. To evaluate the effect of such vibrations on the resulting HRTFs, we then develop physics-based elasto-acoustic models that capture the interaction of the acoustic waves with the pinna structure. The resulting HRTFs will be compared with simulated HRTFs obtained from a pure acoustic model using the conventional perfectly rigid boundary condition assumption for the pinna.

Irrespective of modeling assumptions, any numerical simulation contains numerical errors.^{23} To have a meaningful comparison between the numerical results of the two models, we have to distinguish between the differences caused by using different modeling assumptions (acoustic versus elasto-acoustic) versus the differences caused due to numerical errors. To make this distinction, we estimate the magnitude of the numerical errors through verification exercises. Since, to the best of our knowledge, such exercises are missing for the employed methods and models, we design and conduct the necessary verification exercises prior to comparing the two physics-based models. To this end, controlled convergence studies are conducted for both sets of models to determine a metric for the reliability of the numerical solutions. Finally, we discuss the implications of the findings on both the design and numerical modeling of head and torso simulators (HATS). To the best of our knowledge, this will be the first experimental study visualizing the sound-induced vibrational patterns of an artificial pinna and the first multiphysics modeling investigation for simulating blocked-meatus HRTFs.

## II. MEASUREMENT OF SOUND-INDUCED VIBRATIONS FOR AN ARTIFICIAL PINNA

In this section, we show how an acoustic source of a reasonable intensity induces measurable vibrations in an artificial anthropometric model of a human pinna (KB5000; GRAS Sound & Vibration, Holte, Denmark) with a hardness of 35 Shore 00.

Vibration measurements are done using a full-field high-speed holographic interferometry (FHHI) technique. This technique has been previously used to measure vibrations in various applications, including sound-induced motions of the tympanic membrane (TM) in the nanometer range.^{22,24–28} To accurately measure the evolution of nanometer-scale deformations on the complex geometry of the pinna, full-field measurements with high spatial resolution are required. Due to large low-frequency motion artifacts driven by physiological motions such as heart beat, head muscle activation, respiration, etc., when measuring on live subjects, we used the GRAS KB5000 anthropometric pinna to perform the sound-induced displacement measurements.

### A. Experimental setup

Figure 1(a) shows schematics of the experimental setup. The pinna is fixed at the back of its rectangular base and is mounted on a 1.5-in. Thorlabs post (Thorlabs, Newton, NJ). Our FHHI system uses a 75 mW near-infrared tunable laser (TLB6700; New Focus, San Jose, CA) with a central wavelength of 780 nm. To digitally capture the holograms with the required acquisition speed, a high-speed camera (SA-Z; Photron, Tokyo, Japan) is used. The camera has a 1 MPixel resolution (i.e., 1024 × 1024 pixels with a pixel size of 20 *μ*m). The FHHI method relies on quantification of the optical path length variations due to the sample's motion using a phase-correlation method.^{22} As shown in Fig. 1(b), the FHHI setup consists of an off-axis interferometer that splits a high-coherence laser beam into a reference path and an object path. The reference path is equipped with a mirror mounted on a high-speed precision piezoelectric actuator (PI, Auburn, AL) that can vary the optical path length by a calibrated amount. The reflection from the sample is combined and interfered with the reference beam at the complementary metal-oxide semiconductor (CMOS) sensor of the high-speed camera. The excitation source (a Neumann KH120A loudspeaker; Neumann, Berlin, Germany) and the acquisition triggering of the camera are all synchronized using the master clock of a data acquisition system (DAQ USB 6366; NI, Austin, TX).

Figure 1(b) shows a representative image of the sound-induced vibrational patterns of the artificial ear, observed from the camera's field of view.

### B. Reconstruction of the displacement field

*ρ*denotes the Pearson's correlation between two data sets that is corresponding to the cosine of the interference phase [i.e., $ \rho ( I ref , I def ) \u2245 \u2009 cos ( \Delta \varphi ) ] , \u2009 and \u2009 I ref ( \pi / 2 ) ( x , y )$ is a kernel centered at the desired point (

*x*,

*y*) having a $ \pi / 2$ phase shift with respect to frame $ I ref ( x , y )$. The $ I ref ( x , y )$ is the last frame before the beginning of the transient sample excitation. The “ref” and “def” sub-indices represent the reference and deformed states of the sample in time, respectively. Further details of this method are well explained in previous publications.

^{22}This change in the interference phase, $ \Delta \varphi $, is related to the displacement field,

*d*(

*x*,

*y*,

*t*), through Eq. (2),

*λ*is the wavelength of the laser light (in meters) being used.

### C. Empirical evidence of sound-induced pinna vibrations

This section provides examples of full-field vibrational measurements of a pinna induced by a loudspeaker and demonstrates how an ordinary acoustical source can induce measurable vibrational patterns in the pinna structure. To achieve this, the artificial pinna (GRAS KB5000) is acoustically excited with a loudspeaker that is placed at a distance of 1 m from the pinna, with an amplitude of nearly 80 dB sound pressure level (SPL) at the pinna location. Figure 2 shows representative results of acoustically induced pinna vibrations at different tonal stimuli and vibration phase angles with respect to the excitation signal. The top row shows the modulation intensity images determined by the absolute value of the complex optical phase. The middle row shows the corresponding wrapped mod-2*π* optical phase obtained by Eq. (1); and the bottom figure shows the sound-induced displacement fields corresponding to the unwrapped and scaled optical phase (Eq. (2).

To obtain the root mean square (RMS) of the displacement field at each tonal stimulus, the sound-induced displacement versus excitation phase results, gathered at each of the image pixels, were Fourier transformed.^{29} The magnitudes and phases of the fundamental components were then computed (Fig. 3, left and center columns, respectively). A video of sound-induced pinna vibrations excited with a number of tonal stimuli, captured by our holographic system, is also provided in the supplementary material. The video represents the absolute cosine of the interference phase with a fixed scale of 0–1 for all frequencies. To evaluate the quality of the reconstructed magnitude and phase of the sound-induced displacement fields, we compute the Pearson product correlation coefficient between the measured phasic variations of the displacement and the sinusoidal motion reconstructed from the magnitude and angle of the Fourier-determined fundamental component of the phasic motion. As shown in the right column of Fig. 3, the square of this correlation, *R*^{2}, is generally greater than 0.9, except at locations where the magnitudes of the measured displacements are less than 15 nm. This indicates that the displacement maps are accurately reconstructed.

The left column of Fig. 3 clearly shows that the sound-induced displacement of the pinna is on the order of a few hundred nanometers. This indicates that the pinna vibrates under environmental acoustic stimuli. The effect of such small displacements on HRTFs is unclear; however, results in Fig. 3 question the validity of the rigid boundary condition assumption for the pinna when it comes to HRTF simulations. To evaluate the magnitude of these effects methodologically, we have developed finite element models that further explore this question.

## III. CONTINUUM MECHANICS MODELS

_{s}) and impinged by a time harmonic incident wave, $ p inc$ (Fig. 4). We consider $ x = [ x 1 , x 2 , x 3 ] T$ to be the position variable and

*ω*to represent the angular frequency. In the frequency domain, the total acoustic wave field $ p ( x )$ is

*p*is the total scattered wave, with $ p inc , p , p s : R 3 \u2192 R$. $ p ( x )$ satisfies the Helmholtz equation,

^{s}^{31}

*c*

_{0}being the speed of sound in a fluid or air, while $ \Omega \xaf s$ is the closure of Ω

_{s}. For an exterior problem, the scattered field has to satisfy the Sommerfeld radiation condition at $ r \u2192 \u221e$,

*i*and

*j*take on the values 1, 2, 3 and the dummy index

*j*must be summed on. $ u ( x ) : R 3 \u2192 R 3$ is the displacement vector, and $ S ( x ) : R 3 \u2192 R 3 \xd7 3$ is the stress tensor. For an isotropic homogeneous solid, the stress tensor can be written in the following form using the generalized Hook's law for finite deformations (see, e.g., Ref. 32, p. 147):

*μ*and

*λ*are Lamé parameters.

**and $ tr$ represent the identity matrix and the trace function, respectively. Using the expression for the infinitesimal strain tensor and substituting Eq. (7) in Eq. (6) gives (see, for instance, Ref. 33, p. 275)**

*I**ρ*is the density of air and $ n ( x ) \u2208 R 3 \u2192 R 3$ is the outward normal directed from the solid to air. Continuity of traction on the surface of the elastic solid yields

_{a}## IV. NUMERICAL SIMULATIONS OF FAR-FIELD HRTFS FOR TABLETOP HATS

^{35}In other words, for an omnidirectional point source located at $ ( r , \theta , \varphi )$ relative to the head center in a vertical polar spherical coordinate system,

^{36}an HRTF at a frequency $ f = \omega / 2 \pi $ is defined as

*r*is the radius, $ \theta \u2208 [ 0 \xb0 , 360 \xb0 )$ is the azimuth angle, with $ 0 \xb0$ in front and $ 90 \xb0$ to the left, and $ \varphi \u2208 [ \u2212 90 \xb0 , 90 \xb0 ]$ is the elevation angle, with $ \varphi < 0$ denoting directions below the horizontal plane.

*P*represents the Fourier transform of the sound pressure at the blocked meatus, and

*P*

_{0}represents the Fourier transform of the sound pressure at the head center with the head absent.

We simulate far-field HRTFs for the tabletop Brüel and Kjaer high-frequency HATS type 5128 in two cases: (1) the HATS is modeled as perfectly rigid, and (2) the pinna region is modeled as linear elastic with material properties close to that of auricular cartilage, while the remainder of the head and torso is modeled as perfectly rigid. The former requires the acoustic model from Eq. (12), whereas the latter must be treated as a coupled elasto-acoustic (vibroacoustic) problem, as prescribed by Eq. (11). HRTFs are computed for a plane wave source of unit amplitude at seven spatial directions, five of which sweep the horizontal plane from $ 0 \xb0$ to $ 180 \xb0$ at $ 45 \xb0$ azimuth increments. The last two directions lie in the frontal plane, with an azimuth of $ 90 \xb0$ and a $ \xb1 45 \xb0$ elevation (see Fig. 5). We do not use reciprocity. Although the reciprocity principle holds for any linear time-invariant system in theory, and it has been shown to have negligible error in HRTF simulations with perfectly rigid boundary conditions,^{37} we decided against using it for the following two reasons. (1) The first reason is to enable direction-dependent error estimation. This will be more clear in Sec. IV G, where upper bounds for the error are proposed as a function of the global *L*^{2} norm of the error. (2) The second reason is to avoid any unforeseen numerical difficulties that might arise from having the source close to the surface of the elastic body in the elasto-acoustic case.

### A. Geometry

A computer-aided design (CAD) drawing, provided by HBK (Chicago, IL), is used to include the geometry of the HATS in the model. The mouth and nose cavities are filled to avoid unwanted resonance frequencies that could adversely affect numerical convergence of the discrete models. In the elasto-acoustic models, separate domains for the HATS pinnae Ω_{s} are created by intersecting a cylindrical object, tightly placed around the ears, with the HATS geometry (see Fig. 6). The HATS is aligned to a right-handed Cartesian coordinate system, as shown in Fig. 5.

The scattering problems introduced in Eqs. (11) and (12) are defined on an unbounded domain. However, the FEM is conceptually developed for the numerical discretization of bounded domains. Therefore, its use for unbounded domains requires the introduction of an artificial boundary around the scatterer that incorporates the far-field behavior into the finite element model. On this artificial boundary, the finite element discretization may be coupled with some discrete form of the analytic solution.^{31} Some of the most common coupling approaches include Dirichlet-to-Neumann maps (see Refs. 38 and 39 and references therein), non-reflecting boundary conditions^{40,41} and perfectly matched layers (PML).^{42} We choose to truncate the simulation domain using the PML technique.

To truncate the simulation domain, we define a spherical domain, Ω_{a}, around the HATS with radius *R _{a}*. Next, we introduce a PML domain by extending Ω

_{a}along its exterior with thickness

*dR*

_{PML}. Note that the PML's physical thickness does not matter in a frequency-domain simulation. However, the thickness must be chosen such that the PML mesh is more or less regular (Ref. 43, p. 220). It is recommended that the PML's interior boundary be at a minimum distance of $ \lambda / 8$ (

*λ*represents the wavelength) from the scatterer's surface to avoid unphysical reflections.

^{43}Therefore, for higher simulated frequencies,

*R*may be reduced to save computational resources. We exploit this by considering two different model geometries, depending on whether the simulated frequencies are below or above a frequency threshold. This threshold was set to 10 kHz for the acoustic models and 5 kHz for the elasto-acoustic models (Table I).

_{a}Model . | R (cm)
. _{a} | dR_{PML} (cm)
. | f_{max} (kHz)
. |
---|---|---|---|

Acoustic | |||

f ≤ 10 kHz | 15 | 35 | 10 |

f > 10 kHz | 14.5 | 20 | 20 |

Elasto-acoustic | |||

f ≤ 5 kHz | 18.5 | 35 | 5 |

f > 5 kHz | 15 | 35 | 10 |

Model . | R (cm)
. _{a} | dR_{PML} (cm)
. | f_{max} (kHz)
. |
---|---|---|---|

Acoustic | |||

f ≤ 10 kHz | 15 | 35 | 10 |

f > 10 kHz | 14.5 | 20 | 20 |

Elasto-acoustic | |||

f ≤ 5 kHz | 18.5 | 35 | 5 |

f > 5 kHz | 15 | 35 | 10 |

### B. Materials

The speed of sound in air, *c*_{0}, is set to 343 m/s, and the density of air, *ρ*_{0}, is set to 1.225 kg/m^{3}. In the elasto-acoustic models, the material composition of the pinna as well as its properties must be chosen. In this work, the main focus is to study the HRTF effects for a human pinna, as opposed to a silicon ear, commonly employed in mannequin fabrication.

The mature human auricle is composed of various tissues and organs, such as skin, subcutaneous tissue, (rudimentary) intrinsic muscles, elastic cartilage, cutaneous nerves, lymphatic vessels, capillaries and arterial branches, sebaceous and sweat glands, and fat (mainly in the ear lobe).^{44} Except for the lobule, cartilage is the dominant tissue/organ of the pinna.^{44} Since the mechanical response of the adipose tissues is known to be highly damped,^{45,46} we will focus on the cartilage tissue and consider it as the most important tissue for the elasto-acoustic simulations.

The material model and its properties for cartilage must be determined. Cartilage is mostly composed of interstitial fluid/water (70%), type II collagen (20%), and proteoglycan aggregates (6%).^{47} Due to the low permeability of the cartilaginous structure, the interstitial fluid cannot be squeezed out easily and its flow is restrained.^{47,48} As a result, for short static loading periods or for cyclic loading with moderate to high frequencies, the tissue's mechanical behavior is close to a single-phase solid: under rapidly applied loads, the fluid does not have time to flow relative to the solid cellular matrix.^{48} Under these circumstances, cartilage can be modeled as a homogeneous, linear elastic and nearly incompressible material.^{47} As such, cartilage can be characterized by its Young's modulus, *E _{s}*, Poisson ratio,

*ν*, and density,

_{s}*ρ*.

_{s}The density of cartilage is slightly higher than that of water.^{49} We assume a value of $ \rho s = 1100$ kg/m^{3}. We use a complex modulus, $ E s ( 1 + i \eta )$, common to soft tissue modeling.^{50} Here, *E _{s}* is the storage modulus and

*η*is a loss factor that accounts for internal damping. Measurements of the Young's modulus of cartilage have been reported in several works. However, most works focus on articular cartilage or the septum.

^{49,51,52}A notable work by Griffin

*et al.*characterizes the biomechanical properties of the human auricular cartilage and reports the compressional modulus for different parts of the auricle.

^{53}Based on this last reference, a storage modulus of $ E s = 1.5$ MPa was chosen. Measurements for the loss factor and Poisson ratio

*ν*of auricular cartilage are scarce in the literature. Therefore, we're bound to reference values reported for other types of cartilage. The loss factor for other similar media, such as articular cartilage, TM, and skin, is reported in the range of 0.01 to 0.13.

^{50,54}Here, we assume a loss factor of 0.1.

The selection of the Poisson ratio requires more care since it is intimately related to the compressional wave speed and to the nature of the elasto-acoustic coupling. For an isotropic linear elastic solid, ultrasound wave speed may be calculated as $ c p = ( \lambda + 2 \mu ) / \rho s$.^{49,55} This may be rewritten as $ c p = E s ( 1 \u2212 v ) / \rho s ( 1 + v ) ( 1 \u2212 2 v )$. Ultrasound wave speeds in cartilage have been consistently reported in the range of 1500 to 1700 m/s in the literature.^{49,56,57} Measured values of cartilage's Poisson ratio range from 0.26 to 0.45.^{49,52,58} These values of Poisson ratio correspond to a maximum compressional wave speed of 70 m/s, which is an order of magnitude lower than the values measured and reported in the literature. The authors believe that this apparent discrepancy is due to the simplifying assumption of modeling cartilage as a homogeneous, linear elastic single-phase solid. One, then, needs to choose whether the model remains faithful to the measured values of Poisson ratio or those of the compressional wave speed. Since, as discussed before, the compressional wave speed plays a fundamental role in determining the nature of the elasto-acoustic coupling, we choose the latter and select a Poisson ratio of $ \nu = 0.4999$, leading to a compressional wave speed of $ \u223c 1500$ m/s. This is very close to the sound speed in water and is consistent with the high water content of elastic cartilage.^{47}

### C. Physics

The acoustic models are developed using the *Pressure Acoustics* interface. The *Sound Hard Boundary* feature is used to model all boundary surfaces as perfectly rigid. For the elasto-acoustic models, the *Pressure Acoustics* interface is used within the air domain Ω_{a} and the *Solid Mechanics* interface is used within the elastic pinna domains Ω_{s} (golden region in Fig. 6). To model a plane wave source, a *Background Pressure Field* is applied to the air domain. Continuity conditions [Eqs. (9) and (10)] are applied to the exterior boundary of the elastic domain using comsol's *Multiphysics Acoustic-Structure Boundary* node. The HATS pinna is bonded to a hard plastic core within the ear.^{59} Therefore, for the purposes of the current model, fixed boundary conditions are applied to the interior boundaries of the elastic domains (i.e., all surfaces not in contact with air [see Fig. 6]). This includes the circular surface at the back of the pinna structure, not captured in Fig. 6. Our studies showed that using free versus fixed boundary conditions on the interior boundaries does not make a significant difference in the HRTF response magnitude (results not shown).

A FEM solver with quadratic and cubic Lagrange elements is used for the acoustic and elasto-acoustic models, respectively. The choice for the order of shape functions was based on our findings from code verification exercises (see Sec. IV G).

### D. Mesh

As was explained in Sec. IV A, two sets of model geometries were used, depending on the simulated frequency range to reduce computational demand. For the acoustic HATS, one set of model geometry was developed for frequencies between 250 Hz and 10 000 Hz and another for frequencies above 10 000 Hz. For the elasto-acoustic HATS, a unique geometry was developed for simulated frequencies in the range of 250 Hz to 5000 Hz and another for frequencies above 5000 Hz.

The coarsest mesh for Ω_{a} is generated such that the maximum element size, $ h max$, is set to $ \lambda min / 5$, where $ \lambda min = c 0 / f max$ and $ f max$ is the highest simulated frequency, except for the elasto-acoustic models with simulated frequencies above 5 kHz, where $ f max$ is set to 10 kHz instead of 20 kHz due to the limitation of computational resources (see Table I). The minimum element size is selected such that $ h min = h max / 2$.

We rely on the bulk compressional and shear wave speeds to get a reasonable discretization for the elastic domains. The coarsest mesh is generated such that the maximum element size, $ h max$, is set to $ \lambda min s$, where $ \lambda min s = min { c p / f max , c s / f max}$, with $ c p \u2248 1508$ m/s and $ c s \u2248 21$ m/s. Here, $ c s = \mu / \rho s$ is the shear wave speed in the solid. The choice of $ h max$, in this case, is partially driven by the availability of computational resources. Since the maximum element size for the coarsest mesh is equal to the minimum wavelength, corresponding shear waves may not be fully resolved on this mesh. However, conducting consecutive convergence studies will provide some further assurance and error estimation for the numerical solutions. Similar to the meshing strategy in the air domain, $ h min = h max / 2$.

For the PML to be well-behaved, the mesh inside the PML domain needs to be structured. Therefore, a swept mesh (with 32 layers in the radial direction) is used inside the PML domain (see Ref. 43, p. 209).

Consecutively refined meshes are generated using a 1.1 refinement ratio, following $ h min i + 1 / h min i = 1.1$ and $ h max i + 1 / h max i = 1.1$, with *i* = 0 corresponding to the finest grid.

For the acoustic models of the HATS, a sequence of nine successively refined grids were used to conduct convergence studies below 10 kHz. Above 10 kHz, six grids were used. For the elasto-acoustic models, six and five grids were used for simulated frequencies below and above 5 kHz, respectively. The number of grids used was informed by the availability of computational resources and convergence behavior of the models.

To ensure consistency among grids, each mesh was inspected for two built-in quality metrics in comsol, namely *skewness* and *volume versus circumradius*. The results show that the refinement is consistent across grids (see Appendix A).

### E. Study

The simulations are done in the frequency domain. For both acoustic and elasto-acoustic models, the simulated frequencies start at 250 Hz and are linearly spaced at 250 Hz up to 10 kHz. From 10 to 20 kHz, the spacing is increased to 500 Hz to reduce computational demand. We heavily relied on the cluster computing functionalities within comsol to parallelize jobs and speed up run times.

### F. Calculation of HRTFs

Unlike the usual determination of HRTFs, which involves point-wise computations of pressure at the blocked meatus (see, e.g., Ref. 60) and head center, we compute *P* in Eq. (13) as the volumetric average of pressure inside a small spherical domain, $ \Omega *$, located close to the center of the blocked ear canal (see Fig. 6). This is to accommodate the calculation of error bands for the solution (see the section on “Determination of solution error bands” in the supplementary material). As such, a spherical object with a radius of 4 mm (close to the average radius of the human eardrum^{61–63}) is created at an ∼5 mm distance from the center of each blocked ear canal (Fig. 6). This spatial average will low-pass the frequency response, *P*. In order to recover the original HRTF, we apply an identical averaging operation on *P*_{0} prior to the division in Eq. (13). The average for *P*_{0} is calculated analytically by integrating the free-field acoustic response of a unit amplitude plane wave inside a 4 mm-radius sphere located at the head center (for more details, see the section titled “Analytic determination of average free-field pressure” in the supplementary material). The downside of doing this average is that the result will be closer to an HRTF at the center of $ \Omega *$ (i.e., 5 mm away from the ear canal), which differs from a blocked-meatus HRTF at very high frequencies.

### G. Code and solution verification

*solution verification*. To this end, we used a modified definition of the grid convergence index (GCI), introduced in Ref. 64 to propose a non-sharp upper-bound estimate for the numerical error as follows:

Here, $ HRTF$ is the simulated HRTF and $ V *$ is the volume of $ \Omega *$ (see Sec. IV F). $ GCI$ is determined using convergence studies and global error norms. For more details on how the $ GCI$ is calculated, see the “Determination of solution error bands” section in the supplementary material.

## V. RESULTS

In this section, the simulated HRTFs and corresponding error estimates are presented on the finest grid. Error estimates were calculated through convergence studies (see Sec. IV G and references to the supplementary material).

### A. Pinna vibrations: Effects on far-field HRTFs

A few observations about the solution uncertainty bands are made from the figures: (1) The uncertainty bands for the solutions grow as the frequency increases, (2) the bands show significant widening at the location of sharp peaks and deep notches, and (3) the error bands for the solution at the contralateral side are in general larger compared to the ones on the ipsilateral side. Such behavior of the discretization error is similar to previous estimates from other numerical methods.^{6,66} Figs. 7 and 8 show that the difference between HRTFs modeled using the acoustic and elasto-acoustic model is less than 1 dB across the whole frequency range for all directions considered. However, it must also be noted that the error band for this difference is large (especially for frequencies above 10 kHz). Larger error bands generally imply that the value of the mean is less reliable. This is where using a sharper error estimator than the one presented in Eq. (14) may help reduce the uncertainty in the solutions and the uncertainty of the difference as a result. For now, the value of the mean alone suggests that provided the model assumptions in this work, the vibration patterns of the pinna do not have a significant effect on HRTFs for HATS.

### B. Effect of the Poisson ratio

As mentioned in Sec. IV B, measured values of cartilage's Poisson ratio range from 0.26 to 0.45 in the literature.^{49,52,58} Many common pinna simulators, including the HATS 5128 ear simulator (type 4620), are made of silicon rubber.^{59} Silicon usually has a Poisson ratio between 0.48 to 0.495.^{67} To account for the difference between the value of Poisson ratio used in this work and the range listed for cartilage and silicon materials, we investigate how the selected value of the Poisson ratio will affect the results presented in Sec. V A. To this end, the elasto-acoustic models were used to run simulations using a Poisson ratio of 0.35 for the pinna: approximately the average value reported in the literature for cartilage. Simulations were only run on the finest grid. Therefore, convergence data and, as a result, error estimates were not calculated for the HRTFs obtained by these models.

Figure 9 shows the difference curves from Sec. V A overlaid by the new ones, obtained by using a Poisson ratio of 0.35, for $ 0 \xb0$ azimuth and elevation. For the former, the uncertainty bands have also been included in the figure. The figure shows that using a lower Poisson ratio leads to an overall larger difference (maximum of 2 dB) between the HRTF solutions. The observation holds for all the HRTF directions studied in this work (see Appendix B).

### C. Effects in the near field

Near-field HRTFs (nf-HRTFs) have a significant perceptual^{68,69} importance for virtual reality applications, where the user is often interacting with objects in the acoustic near field.^{70} In this section, we show how using an elasto-acoustic model affects nf-HRTFs using a case study.

We use a monopole source located on the left side and at a 5.5 cm distance from the center of the blocked meatus. The source is located close to the pinna's “focus point”; this was done so that the spherical wave fronts hit the pinna with greater temporal coherence. The intensity of the monopole may be arbitrary for linear models, since the response is normalized with the free-field response at the head center. The original acoustic and elasto-acoustic models (Poisson ratio $ \nu = 0.4999$) are used to simulate nf-HRTFs for the tabletop HATS. We note that the models are run only for the finest grid. As a result, no convergence or uncertainty analysis is done for this case. Figure 10 shows that difference between the solutions is at the same order as the difference between far-field HRTFs.

## VI. DISCUSSION AND CONCLUSION

The primary goal of this paper is to study whether assuming a rigid boundary condition for the pinna has a significant impact on the simulated HRTFs. An experimental system was used to provide empirical evidence for sound-induced vibrations of an artificial pinna when hit by incident sound waves. Previous notable works investigating the occlusion effect have conducted vibrational measurements on the human ear tissue.^{71,72} However, to the best of our knowledge, full-field measurements of sound-induced vibrational patterns of pinna replicas do not exist in prior works. With this empirical evidence at hand, we set out to explore how these sound-induced vibrations affect HRTFs by using numerical simulations. Overall, our numerical results show with high confidence that modeling the vibrations of the pinna (assuming a linear elastic, homogeneous, and isotropic constitutive model for cartilage as its main constituent) has negligible effects for most practical purposes up to about 5 kHz: HRTF magnitude differences are less than 1 dB when the results of rigid vs non-rigid boundaries are compared. This agrees well with previous findings in the literature that have shown the shape of the pinna has minimal influence on HRTFs for frequencies below about 3 kHz.^{73,74} At higher frequencies and for the incompressible case ( $ \nu = 0.4999$), results point towards a similar conclusion, but we have less confidence in our numerical results due to a steep increase in the upper-bound estimator for the numerical errors. It is worth mentioning that HRTF measurements at high frequencies suffer from a similar deficiency. Nevertheless, given that the error bound from Eq. (14) is not sharp, it is likely that the differences are similarly small up to higher frequencies. Results hold for both the far field and the near field. Our experimental results (see Sec. II) clearly show that the acoustic field does couple mechanically with the pinna, causing it to vibrate. Such coupling, however, appears to have negligible effects on the simulated HRTF magnitude when the pinna is modeled as a linear elastic, homogeneous, and isotropic medium. In addition, the amplitude of 80 dB SPL (see Sec. II) used during measurements might have triggered nonlinearities. Such nonlinear effects have been previously reported when measuring HRTFs of mannequins with viscoelastic pinnae at increasing SPLs.^{75}

On the other hand, Fig. 11 shows how variations of the pinna's mechanical properties (in this case, Poisson's ratio, *ν*) could affect the HRTF magnitudes more significantly above 5 kHz. This suggests that at higher frequencies, the actual assumptions of the material model become more important and an impedance of local reaction might not be a good approximation of the boundary. Based on measured compressional wave speeds in cartilage,^{49,56,57} and due to the rich blood supply of the pinna,^{44} the high water content of elastic cartilage, and its low permeability for displacements at audible frequencies,^{47,48} we believe the pinna is better modeled as nearly incompressible ( $ \nu = 0.4999$). Previous wave-based HRTF simulations have already employed such boundary conditions (e.g., modeling the pinna with properties similar to that of water).^{4,76}

Accepting a nearly incompressible material model for the pinna has a couple of implications. First, such a model justifies why modeling the pinna as perfectly rigid is a practically valid approach to HRTF simulations, especially for living subjects. Extrapolation from the present results would indicate that a rigid boundary condition is valid at audible frequencies for any fluid-rich tissue such as human skin. A higher Poisson's ratio leads to increased compressional wave speed and higher equivalent acoustic impedance at the boundary surface, resulting in boundary conditions that more closely resemble the perfectly rigid case. Therefore, the overall differences in HRTF magnitudes are expected to be larger for materials with a lower Poisson's ratio. Poisson's ratio was found an important factor in simulated results in other linear elasto-acoustics numerical studies (e.g., it was shown to have the largest effect on numerical predictions of the occlusion effect among all other parameters considered).^{77}

Existing dummy heads are intentionally made flexible enough to accommodate the fitting of hearing aids,^{78} phones, or earphones.^{79} Previous HRTF (see Ref. 80, Fig. 59) and vibro-acoustical^{81,82} measurements suggest changes in acoustical responses with different shore hardness of the pinna.^{83} Additionally, our findings suggest that Poisson's ratio could affect the HRTFs. Given that the Poisson's ratio for certain types of silicon may be different than that of a real human pinna, HRTFs measured on head or ear replicas (used in, e.g., ear prosthesis^{84}) with silicon artificial ears may not replicate the HRTFs measured *in vivo* on human heads. Consequently, the design of digital twins and accurate validation exercises for HRTF numerical simulations against dummy heads with silicon/rubber ears (as opposed to, e.g., mannequins with rigid pinnae) is more involved: the model validation becomes more complex since the elasto-acoustical coupling becomes harder to neglect. As such, HRTF simulations employing passive and locally reacting normal impedance measurements of silicon/rubber materials (e.g., Ref. 85) may be problematic for accurate validation studies. To have more certainty, more rigorous factorial experiments are required to quantify the influence of various material parameters.

HRTFs are most commonly employed in (dynamic) binaural synthesis, where the end goal is to employ HRTF filters to reproduce the pressure on the eardrum of an individual as closely as possible as in a real-life scenario.^{1,35} HRTFs acquired at the blocked meatus, as in the present study, can achieve this goal only to a limited degree due to additional approximations and assumptions that are rarely met in practice (compensation of the delivery method such as individual headphone equalization and the free-air-equivalence coupling,^{86} the assumption that the ear canal acts as a simple transmission line,^{35,87} the use of non-individual HRTFs,^{88} interpolation errors in dynamic scenarios,^{89,90} head-over-torso orientations,^{91} etc.). Currently, the above-mentioned approximations are generally way more difficult to account for or control (see, e.g., Refs. 86, 92, and 93) compared to error magnitudes discussed in the present work. As such, in the broader context of binaural synthesis, the present results suggest that in modeling, the detailed vibrational effects of the pinna are negligible and the error control should first focus on other aspects of the binaural synthesis chain.

The incompressible pinna material model is congruent with ultrasound compressional wave speeds in elastic cartilage (see Sec. IV B). Nevertheless, the anatomy and structure of the human pinna are way more complex than those of a simple cartilage model.^{44} As such, to have higher confidence in the extrapolation of the present results to *in vivo* HRTFs, highly controlled HRTF validation studies are needed (e.g., Ref. 6), beyond the preliminary measurements presented in this work in Sec. II. Alternatively, a more complex *in silico* model would also be useful in assessing the extrapolation power of the present results.

Finally, we would like to emphasize the limitations of the present study. First of all, the composition of the pinna is only a first order physics-based approximation of the highly complicated anatomy of a real *in vivo* pinna. (1) We consider that the entire pinna is solely composed of a single “irregular plate”^{44} made of elastic cartilage. (2) We model the ear as a linear elastic, homogeneous, and isotropic material, which is not entirely realistic (e.g., Ref. 48), and (3) we do not consider tissues connecting the pinna to the skull, such as muscles or ligaments: one could envision using an average ear tissue to capture the additional components. However, we believe that homogenization of a complex tissue structure, such as the ear, over a wide frequency range will require a careful study on its own. (4) We ignore potential thermoviscous losses at the surface of the skin and vellus hair. (5) We consider a frequency-independent loss factor, *η*, for the material. Additionally, in the absence of more reliable data, we have assumed that the loss factor value is in the range commonly referenced for the TM and skin. Dynamic mechanical analysis (DMA) on auricular cartilage samples could better inform the value of the complex moduli used in the model. (6) The effect of temperature on the mechanical behavior of cartilage as well as its other material properties (such as Poisson ratio or loss factor) has not been taken into consideration in this work. Temperature may affect the material's behavior (i.e., constitutive law) and its properties. The storage modulus used for cartilage in this work is based on measurements of the Young's modulus at 37.5 °C,^{53} which is close to normal body temperatures. (7) The effect of other factors, such as age or smoking, on the mechanical properties of cartilage were ignored. (8) The study did not consider vibrations of the head surface and ignored the impact of the interface between the cylindrical pinna holder and the rigid part of the HATS.

Another limitation of the study is the calculation of HRTFs as a volumetric average within $ \Omega *$, an average that correlates to the center of $ \Omega *$, 5 mm away from the more common blocked-meatus location (see Sec. IV F). Last, results are presented only for one otologically normal pinna: it is unclear if there are other ear shapes (having, e.g., larger surface areas) that are more sensitive to elasto-acoustic coupling even for $ \nu = 0.4999$. Nevertheless, the anatomy of the ear^{44} and the known mechanics of elastic cartilage suggest a high surface acoustic impedance of the pinna. Thus, the transmission of acoustical energy for this value of Poisson's ratio should be minimal across the ear population.

## SUPPLEMENTARY MATERIAL FOR PUBLICATION

See the supplementary material for details on the solution verification exercises, including convergence studies, and error estimation for the numerical solutions.

## AUTHOR DECLARATIONS

### Conflict of Interest

The authors have no conflicts to disclose.

## DATA AVAILABILITY

The data that support the findings of this study are available from the corresponding author upon reasonable request.

### APPENDIX A: MESH DESIGN AND REFINEMENT FOR CONVERGENCE STUDIES

Acoustic models: Below 10 kHz, the average and standard deviation of the *skewness* quality metric are 0.684 × 10^{−3} and 5.6 × 10^{−3}, respectively. The average and standard deviation for the *volume versus circumradius* quality factor are 0.733 × 10^{−2} and 1.21 × 10^{−2}. A similar study was performed on the mesh sequence above 10 kHz. The average and standard deviation for the *skewness* (*volume to circumradius*) metric were 0.686 (0.735) × 10^{−3} and 3.1 (7.08) × 10^{−3}, respectively. The reported metrics show that the refinement is consistent across grids (Table II).

i
. | $ f \u2264 10$ kHz . | f > 10 kHz
. | ||
---|---|---|---|---|

N
. _{e} | $ Q \xaf 1 ( \sigma Q 1 )$ . | N
. _{e} | $ Q \xaf 1 ( \sigma Q 1 )$ . | |

8 | 0.57 | 0.682 (0.110) | ||

7 | 0.74 | 0.683 (0.109) | ||

6 | 0.98 | 0.684 (0.109) | ||

5 | 1.29 | 0.684 (0.108) | 3.63 | 0.686 (0.107) |

4 | 1.71 | 0.685 (0.108) | 4.83 | 0.686 (0.106) |

3 | 2.27 | 0.685 (0.108) | 6.43 | 0.687 (0.106) |

2 | 3.00 | 0.685 (0.107) | 8.56 | 0.689 (0.118) |

1 | 3.99 | 0.686 (0.107) | 11.39 | 0.687 (0.105) |

0 | 5.30 | 0.686 (0.107) | 15.19 | 0.680 (0.126) |

i
. | $ f \u2264 10$ kHz . | f > 10 kHz
. | ||
---|---|---|---|---|

N
. _{e} | $ Q \xaf 1 ( \sigma Q 1 )$ . | N
. _{e} | $ Q \xaf 1 ( \sigma Q 1 )$ . | |

8 | 0.57 | 0.682 (0.110) | ||

7 | 0.74 | 0.683 (0.109) | ||

6 | 0.98 | 0.684 (0.109) | ||

5 | 1.29 | 0.684 (0.108) | 3.63 | 0.686 (0.107) |

4 | 1.71 | 0.685 (0.108) | 4.83 | 0.686 (0.106) |

3 | 2.27 | 0.685 (0.108) | 6.43 | 0.687 (0.106) |

2 | 3.00 | 0.685 (0.107) | 8.56 | 0.689 (0.118) |

1 | 3.99 | 0.686 (0.107) | 11.39 | 0.687 (0.105) |

0 | 5.30 | 0.686 (0.107) | 15.19 | 0.680 (0.126) |

^{a}

*i* indexes the grid, while *N _{e}* represents the number of elements (not including the PML) in millions rounded to 2 decimal places. Values are given as average (standard deviation).

Elasto-acoustic models: Below 5 kHz, the average and standard deviation of the *skewness* quality metric are 0.670 × 10^{−3} and 3.10 × 10^{−3}. The average and standard deviation for the *volume versus circumradius* quality factor are 0.701 × 10^{−3} and 6.64 × 10^{−3}. A similar study was performed on the mesh sequence above 5 kHz: the averages (standard deviations) for the *skewness* (*volume to circumradius*) metric were 0.676 (0.712) × 10^{−3} and 3.43 (7.28) × 10^{−3}, respectively. The reported metrics show that the refinement is consistent across grids (Table III). The frequencies at which simulations are computed are the same as that specified for the acoustic problem.

i
. | $ f \u2264 5$ kHz . | f > 5 kHz
. | ||
---|---|---|---|---|

N
. _{e} | $ Q \xaf 1 ( \sigma Q 1 )$ . | N
. _{e} | $ Q \xaf 1 ( \sigma Q 1 )$ . | |

5 | 0.30 | 0.665 (0.117) | ||

4 | 0.36 | 0.668 (0.116) | 0.85 | 0.670 (0.118) |

3 | 0.44 | 0.671 (0.115) | 1.09 | 0.676 (0.116) |

2 | 0.55 | 0.672 (0.114) | 1.42 | 0.678 (0.114) |

1 | 0.70 | 0.673 (0.114) | 1.85 | 0.679 (0.114) |

0 | 0.89 | 0.670 (0.118) | 2.42 | 0.676 (0.116) |

i
. | $ f \u2264 5$ kHz . | f > 5 kHz
. | ||
---|---|---|---|---|

N
. _{e} | $ Q \xaf 1 ( \sigma Q 1 )$ . | N
. _{e} | $ Q \xaf 1 ( \sigma Q 1 )$ . | |

5 | 0.30 | 0.665 (0.117) | ||

4 | 0.36 | 0.668 (0.116) | 0.85 | 0.670 (0.118) |

3 | 0.44 | 0.671 (0.115) | 1.09 | 0.676 (0.116) |

2 | 0.55 | 0.672 (0.114) | 1.42 | 0.678 (0.114) |

1 | 0.70 | 0.673 (0.114) | 1.85 | 0.679 (0.114) |

0 | 0.89 | 0.670 (0.118) | 2.42 | 0.676 (0.116) |

^{a}

*i* indexes the grid, while *N _{e}* represents the number of elements (not including the PML) in millions rounded to 2 decimal places. Values are given as average (standard deviation).

### APPENDIX B: EFFECT OF POISSON RATIO

## REFERENCES

*Spatial Hearing: The Psychophysics of Human Sound Localization,*

*Auralization: Fundamentals of Acoustics, Modelling, Simulation, Algorithms and Acoustic Virtual Reality*

*Verification and Validation in Scientific Computing*

*Finite Element Analysis of Acoustic Scattering*

*Diseases of the External Ear*

*Kinesiology: The Mechanics and Pathomechanics of Human Movement*

*in vitro*

*An Introduction to Error Analysis: The Study of Uncertainties in Physical Measurements*

*Head and Torso Simulator for the Measurement of Sound Sources Close to the Ear, Standard IEC 60318-7*(

*Artificial Ears, Standard*(

*KEMAR 45BB & 45BC Manual*

*Communication Acoustics*

*Sonic Interactions in Virtual Environments: Human-Computer Interaction Series*