Psychoacoustic stimulus presentation to the cochlear implant via direct audio input (DAI) is no longer possible for many newer sound processors (SPs). This study assessed the feasibility of placing circumaural headphones over the SP. Calibration spectra for loudspeaker, DAI, and headphone modalities were estimated by measuring cochlear-implant electrical output levels for tones presented to SPs on an acoustic manikin. Differences in calibration spectra between modalities arose mainly from microphone-response characteristics (high-frequency differences between DAI and the other modalities) or a proximity effect (low-frequency differences between headphones and loudspeaker). Calibration tables are provided to adjust for differences between the three modalities.
1. Introduction
For cochlear-implant (CI) assessments, stimuli are typically presented from a loudspeaker or spoken by the clinician (Boisvert , 2020). Psychoacoustic CI studies often employ direct audio input (DAI) to the sound processor (SP) for stimulus control. While less ecologically valid than loudspeaker presentation, DAI mimics the use of headphones in acoustic-hearing studies and has proven especially valuable for psychoacoustic tests not easily implemented in free-field (e.g., Buechner , 2020; Eapen , 2009; Zaleski-King , 2019). DAI also allows remote CI rehabilitation and training (De Graaff , 2016) and clinical assessment with results comparable to loudspeaker presentation (Aronoff , 2011; Chan , 2008; Sevier , 2019). However, with the advent of wireless direct streaming (Chen , 2021; Wolfe , 2015), manufacturers are limiting DAI availability. While meeting consumer demand, this approach limits CI psychoacoustic research. Wireless streaming employs compression algorithms, such as those associated with Bluetooth codecs, which can introduce delays (Cho , 2014; Gomez , 2012). Even codecs specifically designed for low latency (e.g., aptX, Qualcomm, San Diego, CA) can have variability on the order of 10s of milliseconds (Katz, 2024). Thus, wireless streaming does not allow precise control of interaural timing for CI research (Hinrichs , 2021).
An alternative method for controlled stimulus presentation is using circumaural headphones over the CI SP to deliver stimuli via the behind-the-ear (BTE) microphone. Unlike hearing aids, CIs do not risk acoustic-feedback whistling caused by the headphones covering the SP. This headphones approach has been employed in a handful of CI spatial-hearing studies (e.g., Brown, 2014; Goupell , 2018; Grantham , 2008; Sheffield , 2020; Williges , 2018) requiring spatial-cue control not feasible with sound-field presentation. However, the acoustics of headphone-to-CI SP coupling have not been validated. Furthermore, adapting tests typically administered using DAI or loudspeaker presentation may require additional considerations. For example, headphones might produce a different spectral transfer function and CI processing features or microphone directionality could behave differently with headphones presentation.
This study characterized the CI electrical response to headphones presentation. CI electrodograms at the output of an internal CI device were recorded in response to pure tones presented to the external SP placed on an acoustic manikin. Responses as a function of the level and frequency were compared across DAI, loudspeaker, and headphone presentation. Effects of CI features and microphone directionality were also assessed.
2. Methods
2.1 Apparatus
Measurements were performed inside a double-walled sound-treated booth. The BTE SP was placed on the left ear of a Knowles Electronic Manikin for Acoustic Research (KEMAR; GRAS Sound & Vibration) with large pinnae. The coil was placed on a detector box [Cochlear Freedom Implant Emulator (CFIE) or RIB2] consisting of an internal CI processor (CI24RE or FLEX28) with resistive loads on each electrode to approximate intracochlear impedance. A laptop PC with 24-bit external sound card (Soundblaster Play!3, Creative Labs) played stimuli and recorded the electrode-array output (44.1-kHz sampling rate). The electrode of interest was manually selected using a built-in switch (RIB2) or a custom-built switch box connected to the Cochlear Freedom Implant Emulator, then routed via 40 dB of attenuation (PA5, Tucker-Davis Technologies) to the sound-card input. For headphones presentation, the sound-card output was connected to open-back circumaural HD650 headphones (Sennheiser) covering the BTE placed on KEMAR's ears. The interior dimensions of the headphone ear cups were 6.4 cm × 4.0 cm. For loudspeaker presentation, the sound-card output was routed through an audiometer (Madsen Astera2, GN Otometrics) to a high-fidelity loudspeaker placed in front of KEMAR (0° azimuth) 0.91 m from the center of the head. The audiometer input calibration was set such that a full-scale tone produced a level of 0 dB on the input volume-unit meter. The audiometer output dial was set to 80 dB hearing level (HL). These settings produced levels of 75–95 dB sound-pressure level (SPL), depending on the frequency, at the location of KEMAR's head for a full-scale tone. (Note that with the audiometer in auxiliary input mode, this HL setting simply serves as an arbitrary non-frequency-dependent gain factor; this does not generate stimuli referenced to the threshold for human hearing as HL is traditionally defined for audiometric pure tones.) For DAI, the sound card was connected directly to the SP via commercial cable.
2.2 Stimuli
Stimuli were 300-ms pure tones (including 25-ms raised-cosine onset/offset ramps) at the center frequencies of the 12 or 22 channels programmed in each CI. Stimulus levels were digitally adjusted in 10-dB steps (−100 to 0 dB re: full-scale). Because pilot recordings showed little or no DAI variability [root-mean-squared (RMS) of the standard deviation (SD) across frequencies and processors = 0.49 dB; range = 0.01–1.2 dB], only a single set of DAI recordings was acquired. For loudspeaker presentation where pilot tests showed a small amount of variability, and for headphones presentation where the variability was somewhat larger, recordings were repeated at each frequency and level. For headphones (≥10 repetitions), the headphones were removed and replaced after each recording set (i.e., one presentation for each frequency and level). For loudspeaker presentation (≥5 repetitions), KEMAR was moved out of, then back into place after each set.
2.3 Device settings
The intended application of the headphones approach in our laboratory is for single-sided deafness CI research. Thus, SPs were tested from the two manufacturers approved by the U.S. Food & Drug Administration for single-sided deafness CI. Six CI SPs were tested: Nucleus 6 (N6), Nucleus 7 (N7), and Nucleus 8 (N8) (Cochlear Ltd.); and Opus2 (O2), Sonnet EAS (S1), and Sonnet 2 EAS (S2) (MED-EL). Coil cable lengths were 6 cm (N6, N7, N8), 7.5 cm (O2), or 9 cm (S1; S2). For the S1 and S2, electroacoustic stimulation was disabled and the acoustic output was covered with putty. Each SP was programmed (Custom Sound 7.0 or MAESTRO 9.0) with the same electrical signal levels on each channel for each SP for a given manufacturer [Cochlear Ltd.: threshold = 73 clinical units (cus), comfortable loudness = 190 cus, pulse width = 25 μs; MED-EL: threshold = 2.40 current units (qu), most comfortable level = 29.99 qu, pulse width = 35.42 μs]. Cochlear Ltd. devices used the Advanced Combination Encoder (ACE) stimulation strategy with a 900 pulses-per-second (pps) rate and a default frequency map (22 channels, 188–7938 Hz). The ACE strategy was set to stimulate only one electrode based on the largest peak in the spectrum. MED-EL devices used the High-Definition Continuous Interleaved Sampling strategy (to avoid any low-frequency effects associated with fine-structure processing) with a 1000-pps rate. However, the default fine-structure processing (FSP) frequency map (12 channels, 70–8500 Hz) was used to cover a broad frequency range. All SPs used monopolar stimulation with one (MED-EL) or two (Cochlear Ltd.) extracochlear ground electrodes. The SP volume was always set to maximum. The sensitivity was 12 (Cochlear Ltd.) or 75% (MED-EL). For DAI, the Cochlear Ltd. mixing ratio was set to 2:1 in the programming software, while for MED-EL the red “non-mixing” DAI cable set the mixing ratio at 90:10. All signal-enhancement features (e.g., adaptive beamforming, noise reduction, adaptive gain) were disabled.
Headphones and loudspeaker recordings were made for all six SPs. DAI recordings were made for all three MED-EL SPs, but only for the Cochlear Ltd. N6 because the N7 and N8 SPs used for testing lacked a DAI port. Thus, the N6 DAI recordings were used for comparison with the N7 and N8 loudspeaker and headphones recordings.
Further recordings were performed to assess the effects of a subset of CI signal-processing features on the headphones calibration using the N6, N7, and S1 SPs. For the N6 and N7, five signal-processing features were examined: (i) automatic scene classifier system (SCAN, classifies the acoustic environment to choose the appropriate program); (ii) automatic sensitivity control (ASC, slow-acting compression to reduce response in noisy environments); (iii) signal-to-noise ratio noise reduction (SNR-NR active noise cancellation for constant or slow-varying noise); (iv) wind noise reduction (WNR, attenuates low-frequency wind noise); and (v) adaptive dynamic range optimization (ADRO, adjusts channel gain to place signals within the electrical dynamic range) (Mauger , 2014). For the S1, microphone-directionality effects were assessed in three modes.
2.4 Analysis
Electrical output levels for each stimulus level were characterized by the RMS voltage during a 181-ms steady-state portion of the tone response. In instances where a silent pause occurred in the electrical response, silences were removed: a 100-sample moving-average window was applied, and only timepoints with resulting amplitude within one SD of the mean were included. Calibration functions were derived from these input–output functions by averaging the input levels required for a fixed range of output voltages (see Sec. 3.1).
Main effects in the comparison between calibration functions and interactions with frequency were estimated using linear regression (lm in R). Because post hoc comparisons (emmeans in R) identified even very small differences ∼1 dB as statistically significant (p < 0.05), we instead considered differences between calibration functions to be clinically significant if they exceeded the mean interaural level difference (ILD) detection threshold of 3.8 dB for bilateral CI users (Grantham , 2008).
3. Results
3.1 Calibration function estimation
Figure 1(A) shows an example set of input–output curves (N6, electrode center frequency = 6500.5 Hz), plotting the RMS electrical output level (arbitrary linear units) vs stimulus level (dB re full scale). The input–output functions for the three modalities (colors) show a similar 40-dB input dynamic range (i.e., the change in input level required to increase the output from floor to ceiling levels). Critically, the three functions are parallel, indicating a simple linear shift: the across-modality differences between the input levels (x axis) required for a fixed output level (y axis) are independent of the threshold output level chosen. The relative calibration factors (in dB), calculated as the mean horizontal difference between linearly interpolated curves over an RMS voltage range of 2–10 linear units (horizontal dash-dotted lines), are shown along with double-arrows. These are the calibration factors required to match output levels across the three modalities for this SP and frequency. Note, however, that these absolute calibration values are arbitrary and simply reflect the different stimulus delivery pathways for each transducer. The most important question is how the calibration spectral shapes compare across frequency.
Figure 1(B) (top) plots the mean calibration factors as a function of frequency for all six devices. Because the primary study question was whether headphones could replace DAI presentation, headphone calibration factors were referenced to DAI. Headphone calibration factors were similar across devices below 2000 Hz, but above 2000 Hz Cochlear Ltd. devices rolled off more sharply. Variability [Fig. 1(B), bottom] was also frequency-dependent, with smaller SDs (<2 dB) below 2000 Hz and larger SDs (often >5 dB) above 2000 Hz.
The roll-off above 2000 Hz suggests that headphones and DAI yield different spectral information. It is unclear whether this roll-off (and the difference between manufacturers) reflects acoustic headphone-to-microphone coupling or CI microphone response or signal processing. To examine this, Fig. 1(C) compares headphones and loudspeaker calibration functions. Cochlear Ltd. SPs showed a significant three-way interaction between modality, frequency, and SP [F(42, 858) = 5.17, p < 0.001] and all two-way interactions and main effects were significant (p < 0.001). MED-EL SPs also showed a significant three-way interaction [F(22, 468) = 1.72, p = 0.02]; all two-way interactions and main effects were significant (p < 0.001) except for SP × modality (p = 0.13).
A high degree of inter-frequency variability was observed for loudspeaker presentation, likely due to the idiosyncrasies of the loudspeaker response. To examine the three-way interactions by comparing headphones and loudspeaker calibrations, the curves were smoothed (solid lines; moving-average filter, three- or five-point depending on the number of frequencies tested). The loudspeaker-headphones difference was considered clinically significant at frequencies where the smoothed loudspeaker curve fell outside ±3.8 dB of the headphones curve (shaded area). For each device, the high-frequency roll-off had a similar shape for loudspeaker and headphones presentation, indicating that roll-off was most likely due to microphone or SP characteristics and not due to headphone-microphone coupling. Below 2000 Hz, the headphones and loudspeaker calibration curves differed, by as much as 16 (Cochlear Ltd.) to 25 dB (MED-EL).
The calibration functions in Fig. 1 were estimated based on electrodogram outputs of a complex non-linear system. Additional control measurements were made (see supplementary material) to explore several possible factors that might have influenced the results. First, responses from microphone test devices investigated the possible influence of the non-linear conversion of microphone responses to electrical stimulation (supplementary material Fig. S1). Second, measurements using noise stimuli tested whether pure tones might have influenced the results via standing waves or uncharacteristic SP behavior associated with narrowband stimuli (supplementary material Fig. S2). Third, additional measurements examined the possible influence of the manikin head on loudspeaker responses (supplementary material Fig. S3). None of these changes had a substantial influence on the pattern of results.
3.2 Effects of CI features
Most CI SPs include front-end features to enhance auditory performance, especially in noise. Such features, intended for sound-field presentation, might affect the headphones calibration. Figure 2(A) shows N7 calibration functions with all features off, only SCAN turned on, or with all features on (SCAN, ASC, SNR-NR, WNR, and ADRO). The SCAN scene classification reported by the Nucleus Smart application during tone presentation was always “Quiet” for both headphones and loudspeaker modalities. There was a significant interaction between feature and frequency [F(42, 572) = 3.57, p < 0.001]. Both main effects were also significant (p < 0.001). SCAN (blue x's) had no noticeable effect on the calibration function. However, turning on all features (red +'s) significantly (>3.8 dB) elevated the calibration function above 1400 Hz.
To isolate the influence of individual features, additional recordings were made with the N6 SP. Because Fig. 2(A) showed no effect of SCAN, SCAN-only served as the baseline (blue x's). Individual features were then added, one at a time or all together [Fig. 2(B)]. There was a significant feature × frequency interaction [F(45, 240) = 1.78, p = 0.0032]. Both main effects were also significant (p < 0.001). ASC (cyan squares), SNR-NR (green circles), and WNR (pink leftward triangles) minimally affected (<3.8 dB) the headphones calibration. However, ADRO (maroon rightward triangles) elevated the calibration >3.8 dB for frequencies >1400 Hz, similarly to all-features-on (red +'s). These data suggest that ADRO was the only feature that influenced the calibration.
With headphones, sound nominally comes from the side and might interact with microphone directionality. Figure 2(C) shows S1 headphone calibration functions with omnidirectional (no directionality), standard (rear null, unattenuated front and side responses), or highly directional settings (attenuation everywhere but the front). An additional condition tested the omnidirectional microphone with noise reduction also enabled. There was a significant feature × frequency interaction [F(33, 192) = 8.11, p < 0.001]. The main effect of frequency was also significant (p < 0.001), but not the main effect of feature (p = 0.14). Omnidirectional (black diamonds) and standard directional (red x's) calibration functions were comparable. However, the highly directional setting (blue triangles) flattened the frequency response and altered the calibration more than 3.8 dB below 380 Hz and at 3000 Hz. Noise reduction had little effect (green +'s vs black diamonds).
3.3 Effects of headphone and processor position
The previous measurements included arbitrary headphone-position variability. A series of additional measurements systematically investigated the influence of the positions of the headphones on the head and of the SP on the ear on the calibration (see supplementary material Fig. S4, which shows that, while some headphone and SP position changes altered the calibration response by more than the 3.8 dB threshold for clinical significance, especially around 4–6 kHz, most position changes yielded changes <3.8 dB).
4. Discussion and Conclusion
This study asked whether headphone presentation could substitute for DAI for psychoacoustic CI testing. In general, headphone calibration spectra showed different behavior relative to DAI at low vs high frequencies [Figs. 1(B) and 1(C)]. Below 3000 Hz, headphone and DAI calibration functions were similar and headphone variability was low for all SPs tested. Above 3000 Hz, there was a high-frequency roll-off (more pronounced for Cochlear Ltd. devices) that was absent for DAI, and headphone variability was larger.
The similar high-frequency roll-off for loudspeaker and headphones presentation for all SPs for a given manufacturer [Fig. 1(C)] suggests it is an SP characteristic and does not reflect undesirable acoustic coupling with headphones. This roll-off has been observed previously in studies examining the relationship between loudspeaker and DAI or Bluetooth stimulus presentation. Aronoff (2011) described head-related transfer functions (HRTFs) derived from SP-microphone recordings for various BTE SPs placed on a manikin. Most SPs, including from the two manufacturers examined in the current study, showed a high-frequency roll-off above 4 kHz (Aronoff , 2011, their Fig. 1). However, it is likely that these estimates did not purely measure the influence of the head, which would require the same microphone with and without the presence of the manikin. Instead, the manikin-mounted SP recordings were referenced to non-manikin recordings from a different, flat-response microphone (Chan , 2008). While this was appropriate for the goal of that study (simulating the SP sound-field response via DAI presentation), the reported HRTFs likely reflect SP microphone characteristics in addition to any head influence. This interpretation is supported by the fact that changing the sound-source position from 0° to 90° azimuth had relatively little influence on the HRTFs reported by Aronoff (2011), and that removing the manikin head had relatively little influence on the SP calibration spectra in the current study (supplementary material Fig. S3). Chen (2021) also examined the loudspeaker-to-SP transfer function for an Advanced Bionics processor using an in-the-canal microphone and a SP microphone test device, identifying a similar high-frequency roll-off (their Fig. 3). While this might reflect the microphone response, they also observed a roll-off for Bluetooth presentation, suggesting it might instead be due to low-pass filtering (e.g., anti-aliasing) early in the SP circuitry before the microphone test device.
The origin of the headphones-loudspeaker difference at low frequencies [Fig. 1(C)] is unclear. Results in the supplementary material rule out several possible causes: non-linear effects after the SP microphone (supplementary material Fig. S1); interactions between CI signal processing and pure-tone stimuli (supplementary material Fig. S2); or the influence of HRTF for loudspeaker but not headphones recordings (supplementary material Fig. S3). One possibility is that the difference might reflect a “proximity effect” (Josephson, 1999). Microphones with directional characteristics operate nonlinearly in the near-field: as the microphone is brought closer to the sound source (as in the headphones-over-SP approach), low frequencies are amplified relative to the high frequencies. Although the electret condenser microphones typical of CI SPs (Yeiser, 2022) are described as omnidirectional, they might nevertheless exhibit some directionality and thus be susceptible to the proximity effect. Additional loudspeaker calibration measurements placed the S2 SP at a range of distances from the loudspeaker. Supplementary material Fig. S5 shows that when the SP was touching the loudspeaker grille (red), the low-frequency calibration spectrum had a shallow negative slope, like the (red) headphone responses in Fig. 1(C). As the SP was moved away from the loudspeaker (supplementary material Fig. S5, yellow, green, and blue), a low frequency roll-off was introduced, like the (blue) loudspeaker responses in Fig. 1(C). These similarities suggest that low-frequency loudspeaker-headphones calibration differences might arise from a proximity effect for headphones presentation.
Regardless of the causes of the deviations between loudspeaker, headphones, and DAI responses [Fig. 1(C)], either modality (headphones or DAI) could be spectrally matched to the desired loudspeaker response with appropriate frequency-specific gain. The supplementary material provides calibration tables for this purpose, listing the stimulus levels required for equal output levels across all frequencies for a given transducer. For example, to simulate loudspeaker responses with headphones presentation, the required headphones frequency-specific gain would be computed by subtracting the loudspeaker calibration values from the headphones calibration values for a given SP. Note that these tables will produce the desired spectral shape but not the desired overall SPL; this would require an additional calibration of the headphones system for a given soundcard using a broadband stimulus and a flat-plate coupler.
This transducer calibration-correction approach has been used previously in studies employing DAI to simulate sound-field listening, including sound localization. Studies correcting for both the HRTF and BTE microphone response by using a flat-response microphone as a reference (Aronoff , 2010, 2011) likely accurately simulated the response of the SP microphone. Other studies that used the same microphone for the head-mounted recordings and head-absent reference (Majdak , 2011; Peng and Litovsky, 2022) only corrected for the BTE HRTF and not the SP microphone response. These studies might have rendered high-frequency information more audibly than sound-field presentation, especially for Cochlear Ltd. devices with a sharp high-frequency roll-off. ILDs are largest at high frequencies; thus, DAI presentation could have overestimated ILD-information availability. This is potentially problematic: CI users rely mainly on ILDs for spatial hearing because interaural time-difference information is poorly relayed by unsynchronized bilateral SPs (Gray , 2021) or between acoustic and electric ears (Zirn , 2015).
One possible downside to headphone presentation is a more variable calibration than for DAI presentation. On the other hand, there was not a dramatic difference between headphone and loudspeaker variability [see SD plots, Fig. 1(C)]. In any case, adding a perceptual check might help ensure a reliable presentation level. For example, one could present a 4-kHz tone [where the largest SDs occurred, Fig. 1(B)] and adjust headphone placement while repeating detection-threshold measurements until the lowest threshold is achieved. Further research is needed to perceptually validate the headphones approach and to develop this type of check.
The study also examined CI signal-processing and microphone directionality features. Only ADRO [Fig. 2(B)] or a highly directional microphone [Fig. 2(C)] affected the mean headphones calibration. While the safest approach is to disable all front-end features during headphones presentation, the data suggest that only these two features should be strictly avoided, with the caveat that the tested features might interact differently with stimuli more natural than the pure tones used here.
Supplementary Material
See the supplementary material for (1) details regarding a series of control measurements carried out to further investigate the calibration spectra for headphones and loudspeaker presentation, and (2) tables providing the relative calibration values required to equalize the broadband spectrum of the CI response across headphones, loudspeaker, and DAI modalities.
Acknowledgments
The views expressed in this article are those of the authors and do not reflect the official policy of the Department of Army/Navy/Air Force, Department of Defense, or U.S. Government. The identification of specific products or scientific instrumentation does not constitute endorsement or implied endorsement on the part of the author, Department of Defense, or any component agency. We thank Coral Dirks for assistance with electroacoustic recordings and Doug Brungart for suggesting that headphones and loudspeaker response differences might be explained by a proximity effect. Research reported in this publication was supported by the U.S. Army Telemedicine & Advanced Technology Research Center FY22 Advanced Medical Technology Initiative, Award No. 13957 (to J.G.W.B.). We thank Cochlear Ltd. and MED-EL for testing equipment and technical support. Preliminary data from this study were presented at the Conference on Implantable Auditory Prostheses, Virtual Meeting, July 2021.
Author Declarations
Conflict of Interest
The authors have no conflicts to disclose.
Data Availability
The data that support the findings of this study are available from the corresponding author upon reasonable request.