A key factor influencing sound quality in open-fit digital hearing aids is the processing delay. So far, the delay limit needed for ensuring optimal (rather than tolerable) sound quality has not been established. Using a realistic hearing aid simulator, the current study investigated the relationship between preferred sound quality and five processing delays ranging from 0.5 to 10 ms in listeners with normal and impaired hearing. The listeners with normal hearing showed a strong preference for the shortest delay. For the listeners with impaired hearing, participants with mild hearing losses below 2 kHz also preferred the shortest delay.
1. Introduction
According to the World Health Organization (WHO, 2021), hearing loss (HL) currently affects nearly 20% of the worldwide population. The hearing aid (HA) adoption rate, however, is generally low, even in developed countries such as the United States (33%), Germany (37%), France (41%), the UK (48%), and Denmark (53%) (EHIMA, 2016, 2018a,b,c; Grundfast and Liu, 2017). Poor sound quality has been identified as one of the key reasons why persons fitted with hearing aids do not wear them (McCormack and Fortnum, 2013). Processing delay, which is inherent to all digital HAs, is critical for the perceived sound quality and may therefore contribute to low adoption rates.
Processing delay becomes problematic when interaction of the HA output signal and the direct sound signal (transmitted through or around the earpiece) causes distortions. These distortions depend on the magnitude of the delay, the gain provided by the HA, and the degree of openness of the fitting. The most distinct perceptual consequence of delay-based distortions is a “coloration” of the sound; that is, a change in timbre brought about by the spectral peaks and notches that occur in the mixed sound signal (Bramsløw, 2010). This phenomenon is known as the “comb-filter effect.” In addition, delay-based distortions can cause temporal effects, such as the perception of echoes when the HA delay exceeds the echo threshold (Litovsky et al., 1999).
Open-fit HAs and HAs with large leakage paths in or around the ear piece are most susceptible to the acoustic and perceptual consequences of HA delay. The majority of HA fittings today can be considered open (Froehlich et al., 2019), but, to date, only a few studies have investigated the perceptual consequences of HA delay with open-fit devices (Bramsløw, 2010; Burwinkel et al., 2015; Groth and Søndergaard, 2004; Zakis et al., 2012). Utilizing delays of 2, 4, and 10 ms, Groth and Søndergaard (2004) found that participants with hearing impairment noticed a slight disturbance related to own-voice perception at 10 ms of delay, whereas participants with normal hearing (NH) noticed such effects at 4 ms. Bramsløw (2010) found that delays of 5, 7, and 10 ms were audible but did not find consistent delay preferences in participants with normal or impaired hearing. When testing delays in the range of 4.5–25 ms, Burwinkel et al. (2015) observed that fewer than half of their participants with hearing impairment could detect differences between the different delay conditions. For the participants who could detect differences, 25 ms of delay was the least acceptable.
Although the studies summarized above can offer some insight into tolerable HA delays, none of them considered very short delays (<2 ms), possibly because of hardware limitations. As a result, their focus was typically on rather long delays and thus on measurements of when the sound quality was still perceived as acceptable. To allow for very short delays to be tested, some studies used simulations. For example, Stone et al. (2008) asked a group of ten participants with NH to assess the perceived disturbance caused by delays ranging from 1 to 15 ms. They found that shorter delays generally resulted in better ratings, but that even a delay as short as 1 ms could cause some disturbance. Using a HA simulator with open earmolds, Goehring et al. (2018) found that a delay of 1.5 ms was rated least annoying, relative to delays of ≥10 ms, by participants with and without hearing impairment. Denk et al. (2021) measured detection thresholds for delay-based spectral distortions in simulated HAs. They found that the detection of spectral distortions was possible for delays in the range of 0.1–8 ms.
Overall, there is evidence that very short delays can offer user benefits. However, the findings were obtained with unrealistic HA simulations, that is, simulations that neglected some essential acoustic properties of open-fit HAs. Particularly, the frequency and phase response as well as the direct sound were disregarded.
Recently, Stiefenhofer (2022) implemented a realistic HA simulator that preserves the acoustics of open-fit HAs. The simulator, which is based on impulse responses measured with real open-fit HAs, can accurately predict real-ear aided gain on an acoustic mannikin. The direct sound path and HA path are simulated using multiple linear filters with the goal of preserving the respective frequency and phase response of both paths and hence the specific acoustics of open-fit HAs. The predictions made with the HA simulator were in very good agreement with verification measurements. Using this HA simulator for a listening experiment, Stiefenhofer showed that participants with NH could discriminate noise stimuli in terms of their sound coloration for delays ≥0.3 ms. Further, participants with mild hearing impairment and (near-)NH in the lower frequencies could do the same for delays as low as 0.6 ms, pointing to a potential preference for shorter HA processing delays.
The primary purpose of the current study was to use the HA simulator of Stiefenhofer (2022) to investigate the effects of short processing delays on perceived sound quality with realistic and relevant everyday sounds in listeners with NH and mild-to-moderate sensorineural hearing losses. A secondary purpose was to assess whether the degree of hearing loss is correlated with delay preference. We hypothesized that the sound quality resulting from delays <1 ms would be preferred by listeners with NH and with relatively mild hearing losses in the low-frequency range.
2. Methods
Ethical waiver was obtained from the Research Ethics Committee of the Capital Region of Denmark (Case No. H-18056647) and the Research Ethics Committee of the Region of Southern Denmark (Case No. 20212000-06). The participants received written and oral instructions about the aims of the current study. They then signed an informed consent form.
2.1 Participants
2.1.1 Normal-hearing group
Thirteen participants with NH (8 males, 5 females) and a mean age of 34 years (range: 24–49) were recruited from the staff at WS Audiology, Lynge, Denmark. The participants were recruited via an announcement on the company intranet. The announcement stated that participants with NH were needed for a listening test that would take approximately 1–1.5 h. The recruited participants came from various departments across the company and were blinded to the study's research questions. Prior to inclusion, the participants were screened using pure-tone audiometry to make sure their hearing thresholds were 20 dB hearing level or better across all standard audiometric frequencies.
2.1.2 Hearing-loss group
Twenty participants with hearing impairment (7 males, 13 females) and a mean age of 65 years (range: 56–75) were recruited from the patient population at Odense University Hospital, Odense, Denmark. The participants had symmetrical, mild-to-moderate sensorineural hearing losses with thresholds not exceeding the N3 standard audiogram (Bisgaard et al., 2010) by more than 10 dB (Fig. 1). Additional inclusion criteria were: (1) no audiological complications such as fluctuating hearing loss, chronic otitis media or tinnitus as primary reason for seeking HA treatment, (2) the ability to operate and read text on a tablet, and (3) no language-related or cognitive problems that would hinder participation in the study. For their participation, the participants received a monetary reimbursement corresponding to 120 Danish crowns/h.
2.2 Signals
Three types of sound signals were used. They were chosen to be realistic and relevant to the participants, that is, they should reflect everyday sounds. Another criterium was that the signals should be suitable for making delay-based distortions audible. Comb-filter effects are most easily perceivable in stationary broadband signals, whereas temporal effects are most easily perceivable in transient (e.g., click-like) stimuli. Thus, the following signals were selected:
Raindrops on an umbrella: A broadband signal with some transient characteristics from single raindrops splashing on the umbrella.
Male speech in quiet: A very relevant signal for daily-life listening.
Keystrokes on a mechanical computer keyboard: A signal with very clear transient characteristics with most of its energy in <2 kHz.
2.3 Signal processing
Five delays were tested. Four were frequency-independent delays of 0.5, 2, 5, and 10 ms. The fifth delay was frequency-dependent, ranging from approximately 7 ms at 1 kHz to 5 ms at 8 kHz, with an average delay of 6 ms as measured in third-octave bands from 0.5 to 8 kHz. These five delays reflect typical delays in commercially available HAs (Balling et al., 2020).
To generate the stimuli, the impulse responses from the HA simulator of Stiefenhofer (2022) were used. These represent the direct sound (hds) and the HA-processed sound (hprc). The latter consists of a series of three impulse responses: (1) the acoustic path from the sound source to the HA microphone (hmic), (2) the acoustic path from the HA receiver inside the ear canal of the acoustic mannikin to the microphone inside the ear simulator (hrcv), and (3) the processing in the HA (hHA). The impulse response hHA represents the digital signal processing with either the frequency-dependent delay or one of the frequency-independent delays as well as the applied insertion gain.
Each of the three sound signals was filtered with hprc and hds to simulate the HA path and direct-sound path. Subsequently, the two filtered signals were combined to produce the mixed signal. The processing was done separately for the left and right ears. The signals were presented binaurally via free-field equalized Sennheiser (Wennebostel, Germany) HDA200 headphones. To avoid including the effects of two ear canals (one from the acoustic mannikin, one from the listener), the signals were also filtered with a 128-tap minimum phase finite impulse response filter with the inverse frequency response of hds (to remove the effect of the ear canal of the acoustic mannikin). All stimuli were simulated as coming from the front of the listener.
The amplification applied by means of the HA simulator was linear (i.e., no amplitude compression was used). For the normal-hearing group, the amplification in the HA simulator was set to correspond to the insertion gain for the N2 standard audiogram (Bisgaard et al., 2010) derived from the “National Acoustic Laboratories–Non-linear 2” (NAL-NL2) rationale (Keidser et al., 2011) with a 65-dB-sound pressure level (SPL) pink noise as the input signal. To avoid presenting uncomfortably loud signals to these listeners, the broadband level of the final stimuli was lowered to 65 dB SPL in all cases.
For the hearing-loss group, the insertion gain was set individually, such that the amplification in the HA simulator corresponded to the individual HL fitted with the NAL-NL2 rationale with a 65-dB-SPL pink noise input signal. The resulting presentation levels ranged from 72 to 82 dB SPL.
Typically, amplification <1 kHz is turned off in open-fit HAs to minimize comb-filter effects and thus achieve better sound quality (Bramsløw, 2010). However, turning off amplification in the lower frequencies is a compromise solution, which may reduce the effect of HA features such as directional microphones and noise reduction (Keidser et al., 2007). To make the delay-based distortions in the current experiment as audible as possible, the simulated HA provided amplification down to approximately 0.5 kHz. For lower frequencies, the amplification was gradually decreased with approximately 12 dB/octave. Thus, in contrast to typical open-fit devices, the frequency range over which amplification was provided was approximately one octave wider. Effective amplification below 0.5 kHz is impracticable in open-fit devices, as the vent effect dominates in that frequency range (Nordahn, 2009).
2.4 Procedure
The participants were seated comfortably in a sound booth while wearing headphones and facing a computer (normal-hearing group) or iPad (hearing-loss group) screen with a graphical user interface. A forced-choice pairwise comparison task was used. On a given trial, the participants were presented with two versions of one of the three test signals that differed only in the simulated delay. The participants' task was to identify the sound they liked best. Preferred sounds were given a score of 1, whereas non-preferred sounds were given a score of 0. The sounds were played back in an infinite loop, and the participants could freely switch between the two (synchronized) sounds. As five delays were used, there were ten pairwise combinations to assess. Each pair was presented three times. In total, each participant compared 90 pairs. The presentation order of the delays and sound signals was randomized across the participants.
Before the actual measurements, the participants were trained in the task. The training consisted of two sessions, each including three comparisons (one for each of the three sound signals) in randomized order. The delays were 0.5 vs 10 ms in the first session and 0.5 vs 5 ms in the second session. During the training, the test leader was present in the sound booth. The participant and test leader listened to the sounds together via a loudspeaker. The participants were encouraged to report what effects they could hear, how the two sounds differed, and what they preferred. The purpose of the training was to familiarize the participants with three different sound signals and to encourage them to listen for potential differences between the presented stimuli. The test leader remained neutral with respect to the differences noticed by the participants.
All measurements were completed in a single 1.5-h visit including breaks. The participants were instructed to take breaks as needed, and a mandatory break was given after 30 min of testing.
2.5 Data analysis
The preference scores were analyzed using the Bradley-Terry-Luce (BTL) model (Bradley, 1984). To that end, the OptiPt function implemented in matlab by Wickelmaier and Schmid (2004) was used. The input was a preference matrix, which was created by adding the preference scores for each delay across all trials per participant. The output was a single BTL score for each delay, which was normalized such that the sum of the five BTL scores added up to 1. Thus, the BTL scores can be interpreted as the probability of a given delay being preferred over the other four delays. Another output of the OptiPt function was a covariance matrix, which was used to compute 95% confidence intervals (Wickelmaier and Schmid, 2004). The results are presented as mean BTL scores together with 95% confidence intervals of the mean, both for the individual sound signals and across all sound signals. When the 95% confidence intervals do not overlap, differences between mean BTL scores can be considered significant at the 5% significance level.
To investigate the effect of low-frequency HL, the participants in the hearing-loss group were stratified into mild and moderate sub-groups (N = 2 × 10). The degree of HL was calculated as the pure tone average (PTA) in the 0.5–2 kHz range for the left and right ears. For the definition of the two sub-groups, the larger PTA value across the two ears of each participant was used. A participant was considered to have mild low-frequency HL if the PTA was ≤ 35 dB HL and otherwise a moderate low-frequency HL. The 35-dB-HL split was chosen as this is the median between the N2 (mild HL) and N3 (moderate HL) standard audiograms in the 0.5–2 kHz frequency range (Bisgaard et al., 2010).
To check if there was a relation between low-frequency HL and delay preference, Pearson's correlation coefficient was calculated. The analysis was performed in Stata (StataCorp, College Station, TX) v15 with the BTL scores for the 0.5 ms delay across the three sound signals.
3. Results
Figure 2 shows the mean BTL scores with confidence intervals for the NH and HL groups. The NH group showed a clear preference for the 0.5 ms delay for the rain and speech signals. Although the HL group showed no clear preference, there was a slight trend for the 0.5 ms delay being preferred for the rain and speech signals. Both groups preferred up to 2 ms of delay for the keyboard signal.
Figure 3 shows the results for the two hearing-loss sub-groups. The sub-group with milder HL showed a stronger preference for 0.5 ms of delay for the rain and speech signals. Figure 4 shows the results from the correlation analysis. Low-frequency HL was found to be negatively correlated with the BTL scores for the 0.5 ms delay (r = –0.54, p = 0.01). In other words, better hearing thresholds in the low-frequency range were found to be associated with a stronger preference for the shortest delay.
4. Discussion
The results of the current study indicate a strong preference for the shortest (0.5 ms) delay in the normal-hearing group. For the hearing-loss group, preference scores were correlated with the degree of HL, with participants having (near-)NH thresholds <2 kHz also preferring the shortest delay.
These results are in line with previous studies that have shown potential advantages of very short HA delays (Denk et al., 2021; Goehring et al., 2018; Stiefenhofer, 2022; Stone et al., 2008). Stone et al. (2008) and Goehring et al. (2018) both observed that shorter delays of 1–1.5 ms resulted in lower disturbance/annoyance ratings than longer delays for speech stimuli. Denk et al. (2021) and Stiefenhofer (2022) further showed that delay-based spectral distortions and coloration of sound could be detected at delays ≤ 0.6 ms for noise stimuli. Their work indicates that delay-based distortions can well be detected and discriminated in case of HA delays <1 ms. This may also be why Bramsløw (2010) did not find clear differences in sound quality preferences, as he tested delays in the 5–10 ms range. In contrast to other studies that investigated very short delays, Zakis et al. (2012) failed to demonstrate a preference for 1.3 ms over 3.4 ms of delay for participants with NH. However, in their study, a music stimulus was used, and music may not be the most sensitive type of stimulus with respect to perceiving spectral distortions due to HA delay. Also, echo thresholds for music are typically above 10–20 ms (Braasch et al., 2007).
Earlier studies that included very short HA delays offered insight into when delay-based distortions are just noticeable and when the effects of HA delay become objectionable. In other words, they did not investigate at which delay sound quality is optimal. As sound quality can play a major role for HA satisfaction and uptake (Kochkin, 2000), the current study focused on how short HA delays should be to achieve the best possible sound quality. Our results point to a clear preference for very short delays in listeners with a sensorineural HL that is inversely related to low-frequency hearing thresholds. Based on the data reported here, it seems that hearing aid delays below 2 ms are preferable to longer delays, and delays may need to be as low as 0.5 ms to show a consistent sound quality preference. This is also in accordance with the Stiefenhofer (2022) findings of the thresholds below which delay is audible.
Only a few previous studies on HA delay included participants with hearing impairment. These studies generally demonstrated that the degree of HL plays a role in the sensitivity to delay-based sound distortions. In a simulation study with delays in the 10–50 ms range, Goehring et al. (2018) observed that participants with NH gave higher annoyance ratings than participants with hearing impairment for most delays, especially the longer ones. Moreover, HL severity played a role, whereby listeners with stronger HL showed a decrease in sensitivity to changes in delay. Using non-occluding HA fittings, Groth and Søndergaard (2004) found that participants with hearing impairment reported less disturbance to sounds subjected to 4 and 10 ms of delay. Bramsløw (2010) also found that participants with NH were, in general, more sensitive to delay-based distortions relative to participants with impaired hearing. Additionally, Stiefenhofer (2022) observed that participants with normal or (near-)normal low-frequency hearing thresholds but elevated high-frequency thresholds achieved coloration-pitch discrimination thresholds comparable to those of participants with normal audiograms.
Typically, comb-filter effects are most prominent below 2 kHz in open-fit HAs, since the level of the HA sound and the direct sound are approximately equal (Stiefenhofer, 2022). Elevated hearing thresholds in this range generally result in higher HA gain and thus less pronounced comb-filter effects. This, in turn, can result in a smaller perceptual contrast between different delay settings. Buchholz (2011) also showed that high-frequency components contribute less to the detection of coloration than low-frequency components. Altogether, this means that participants with normal or (near-)normal hearing in the lower frequencies are more susceptible to the consequences of HA delay, which may explain the correlation between low-frequency hearing thresholds and preference scores found here.
Regarding the keyboard signal, neither the listeners with normal nor impaired hearing showed a preference for the shortest delay. Nevertheless, both groups had a clear preference for ≤ 2 ms of delay. This is likely because transient sounds are less sensitive to comb-filter effects. Moreover, echo thresholds for click stimuli are typically around 2–3 ms (Litovsky et al., 1999), and this may explain the observed preference for 0.5 and 2 ms of delay.
In terms of clinical implications, our results imply that very short HA delays are preferable for HA users with low-frequency hearing thresholds in the (near-)normal range. Hence, instead of turning off amplification below 1 kHz to achieve better sound quality and consequently reduce the effects of advanced HA features, an alternative solution could be to decrease HA delay to optimize sound quality perception. However, by restricting the HA processing delay to very low values, it may become impossible to run some of the usual signal processing algorithms that aim to improve speech intelligibility (e.g., beamforming). In other words, there may be a trade-off between achieving good sound quality and good speech intelligibility. Furthermore, it should be noted that it is not the number of active algorithms, but the choice of processing strategy that determines the HA delay.
5. Conclusions
Using a realistic HA simulator, we found that a processing delay of 0.5 ms is clearly preferred over longer delays for participants with NH. For the hearing-loss group, delay preference was related to the degree of HL. Participants with hearing thresholds in the (near-)normal range <2 kHz also preferred the shortest delay. As sound quality is a key contributor to HA satisfaction and uptake, more focus should be placed on optimizing processing delay for HA users with mild HLs.
ACKNOWLEDGMENTS
The current study was funded by the Innovation Foundation Denmark (Ref. No. 0153-00223B) and WS Audiology.