A related paper [Hazan, Tuomainen, Tu, Kim, Davis, Brungart, and Sheffield. (2018b). Hear. Res. 369, 33–41] showed that, for young adult listeners, speech produced by older adults was less intelligible than the speech of young adults but both talker groups improved the intelligibility of their speech via clear speech modifications. Here, this study was extended to include older listeners with/without presbycusis. The results showed that for older listeners, speech produced by older adults was less intelligible than the speech of young adults and scores improved in the clear speech condition. The best predictor of intelligibility was the amount of energy in the mid-frequency range of the spectrum.
1. Introduction
Many older adults (OAs) have difficulty communicating in more demanding everyday situations, such as when there is noise in the background, due to age-related sensory, physiological, and cognitive declines. For example, age-related hearing loss (presbycusis) reduces the audibility and clarity of the signal often leading to difficulties understanding speech. Increasing age can also affect fine motor skills leading to reduced articulatory precision [see Gordon-Salant (2014) and Hazan et al. (2018a) for a discussion]. Together these age-related factors can influence speech intelligibility for both listeners and talkers: older listeners may find other talkers less intelligible, and speech of older talkers may be less intelligible to other listeners, particularly in adverse communicative conditions. To our knowledge, no previous study has examined speech intelligibility of young and older talkers for both young and older listeners.
When communicating in adverse conditions, talkers often change their speech from every day “conversational” style to “clear” speaking style. These conversational-to-clear speech modifications include a range of acoustic-phonetic adjustments that are shown to enhance speech intelligibility in various talker and listener groups, including OAs (see Smiljanić and Bradlow, 2009, for a review). For example, Schum (1996) reported that, for older hearing-impaired listeners, both young and older groups of talkers were more intelligible when they adopted a clear speaking style, and the magnitude of the clear speech benefit did not differ between the two talker groups. Similar findings were reported by Smiljanić and Gilbert (2017) in normal hearing young adult (YA) listeners. These results suggest that older talkers are able to enhance the intelligibility of their speech regardless of listeners' age. However, conflicting evidence was found in a study by Ferguson and Kewley-Port (2002) who reported that, while YAs showed a typical pattern of increased intelligibility for clear speech over conversational speech, no such advantage was found for older hearing impaired listeners. They also reported that the relative importance of acoustic cues to perceived intelligibility differed between the two listener groups suggesting that age-related hearing loss can affect the way listeners use acoustic information to understand speech.
Focusing on talker age, a recent study by Hazan et al. (2018b) investigated the relationship between speech intelligibility and acoustic characteristics of conversational and clear speech in young talkers and in older talkers with/without mild presbycusis for YA listeners. They reported that, although both groups of older talkers were perceived as less intelligible than young talkers for YA listeners, clear speech was more intelligible than conversational speech for all talker groups. Also, the same acoustic features predicted speech intelligibility regardless of speaking style and talker age.
The current study extends the study by Hazan et al. (2018b) by investigating the perception of the same speech materials for OA listeners with and without mild presbycusis. More specifically, the study investigated, across speaking styles (i) if older talkers are less intelligible than young talkers, (ii) whether the same acoustic factors predict perceived intelligibility as was found in Hazan et al. (2018b) for young listeners, and (iii) if the relative intelligibility of individual talkers is consistent across listener groups. If listener age is less important than talker age for speech intelligibility, we predict that, for all listener groups, clear speech will be more intelligible than conversational speech, older talkers will be less intelligible than young talkers, the same acoustic parameters will predict perceived intelligibility, and high- and low-intelligibility talkers will be the same across listeners groups, as found by Hazan and Markham (2004) for speech produced by children and YAs.
2. Method
2.1 Listeners
A total of 35 OA listeners were recruited via the University of the Third Age in London. They were all monolingual speakers of Southern British English. All OAs passed the shorter version of the Mini-Mental State Exam dementia screening (>18 out of maximum 20) and reported no history of speech and language difficulties or learning difficulties. Sixteen of these OAs were classified as “normal hearing” (OANH; M = 69.6 years, range 60–85 years) as they achieved a better ear average <20 dB calculated across 0.25–4 kHz octave frequencies (M = 14.1 dB, range: 5–19 dB). Nineteen OAs were classified as having an age-related mild hearing loss (OAHL; M = 74.6 years, range 68–81 years): they achieved average thresholds between 20 and 40 dB in the better ear (M = 25.7 dB, range: 20–36 dB) and showed a typical sloping pattern at the higher frequencies. The YA listeners (M = 25.9 years, range 18–34 years, N = 21) are those reported in Hazan et al. (2018b) and they all had normal hearing (M = 3.3 dB, range: −5–14 dB).
2.2 Talkers, stimuli, and procedure
The materials and procedure used in this study are as described in detail in Hazan et al. (2018b) and are summarised here. The speech materials were short extracts from the speech of 30 lead talkers from spontaneous speech dialogues elicited using the DiapixUK picture materials (Baker and Hazan, 2011) under two listening conditions: (a) when both talkers in the interaction could hear each other normally, thus eliciting unmarked conversational speech, and (b) where the interlocutor was hearing the lead talker via a simulated severe-to-profound hearing loss, eliciting a clear speaking style in the lead talker. Here, these two speaking styles are described as conversational (CONV) and clear (CLEAR). The talkers included 10 YA (M = 20.9 years), 10 OANH (M = 70.5 years), and 10 OAHL (M = 75.0 years; 5F, 5M in each talker group) selected randomly from the elderLUCID speaker corpus (see Hazan et al., 2018a, for details).
From these spontaneous conversations, 120 speech excerpts were extracted from the CONV and CLEAR conditions resulting in 20 excerpts per talker group (10 YA, 10 OANH, 10 OAHL) in each of two speaking styles (CONV, CLEAR). Speech excerpts were played in a single block containing breaks every 30th item in pseudo-randomised order such that an individual talker was never repeated. Excerpts were short (5–7 words) and contained 3–5 keywords (e.g., “each door has got three glass panels” or “there is a lady with brown hair”). Even though the lexical content of the samples varied across talkers and conditions, most samples were simple descriptions of parts of the pictures containing high-frequency lexical items similar to the BKB (Bamford-Kowal-Bench) sentences (Bench et al., 1979). The excerpts that were selected were equally spaced throughout the interaction (the original diapix interactions lasted up to 10 min), did not contain any disfluencies and were not produced in response to a request for repetition by the interlocutor.
Speech samples were root-mean-square (RMS) normalised and mixed in 8-talker speech babble in Matlab (version 2016b). In order to investigate the effects of age and hearing loss beyond audibility of the signal, we compensated for the difference in hearing levels between the three listener groups. The signal-to-noise ratios (SNRs) were fixed at different levels for each group based on thresholds achieved in a prior unpublished study with a different set of 61 YA-, OANH-, OAHL-listeners (identical selection criteria to the current study). In this study, the thresholds were measured using BKB sentences, embedded in the same 8-talker babble, in an adaptive procedure tracking 67% correct performance. There were a total of 128 sentences divided into four blocks by talker age and speaking style (YA-CONV, YA-CLEAR, OA-CONV, OA-CLEAR). The thresholds for the current study were based on those achieved for the YA-CONV condition in order to avoid floor and ceiling effects in the different listener groups: −9 dB for YA-listeners, −7 dB for OANH-listeners, and −4 dB for OAHL-listeners. The speech samples were played on a laptop computer using Praat (version 6.0.19) via Sennheiser HD25 SP headphones (Wedemark, Germany) at 70 dB sound pressure level. Participants were asked to listen and repeat what they had heard, and their performance was scored on-line by the experimenter (percentage keywords correct). The experiment took approximately 20 min to complete.
The acoustic-phonetic profiles and statistical comparisons between the three talker groups are reported in Hazan et al. (2018b) and summarised here. For the talker group differences in the acoustic-phonetic measures: regardless of hearing status, OAs were slower speakers (p = 0.048) and produced less relative energy in the 1–3 kHz frequency range of the long-term average spectrum (ME1–3 kHz) than YAs (p = 0.018); speech in CLEAR condition was slower, had higher ME1–3 kHz and higher median f0 than speech in the CONV condition across the three talker groups (all comparisons p < 0.001). Only YA talkers showed evidence of a larger vowel space area for CLEAR than for CONV speech (p = 0.002).
2.3 Data analysis
A one-way analysis of variance (ANOVA) for the intelligibility scores in the YA-CONV condition confirmed that the SNRs chosen for each of the three listener groups (YA, OANH, OAHL) led to similar performance levels for these three groups for this reference condition (YA: 56.6%, OANH: 59.6%, OAHL: 52.2%; F(2,55) = 0.99, p = 0.379).
A repeated measures ANOVA with within-subject factors of Talker group (3: YA, OANH, OAHL) and Communicative condition (2: CONV, CLEAR) and a between-subject factor of Listener group (3: YA, OANH, OAHL) was used to analyse the intelligibility scores. Bonferroni correction was applied to unplanned follow-up multiple comparisons. To investigate which acoustic characteristics predict perceived intelligibility [see Hazan et al. (2018b) for a full description], we ran multiple linear regression (stepwise) analyses for the two older listener groups (YA data reported in Hazan et al., 2018b). The relative intelligibility of individual talker for the three listener groups was assessed using an Intra-Class Correlation coefficient (ICC) separately for CONV and CLEAR conditions. ICC estimates and their 95% confident intervals (CI) were calculated based on a mean-rating (k = 56), consistency, 2-way random-effects model.
3. Results
Intelligibility scores (% keywords correct) for CONV and CLEAR conditions for the YA-, OANH-, and OAHL-listeners are presented in Fig. 1. There were significant main effects of Talker group [F(2,106) = 86.88, p < 0.001] and Communicative condition [F(1,53) = 209.58, p < 0.001]. YA talkers (M = 61.1) were significantly more intelligible than both OANH (M = 49.7; p < 0.001) and OAHL (M = 48.0, p < 0.001) talkers, but OAHL were not less intelligible than OANH (p = 0.086). As predicted, speech produced to counter the effect of the hearing loss simulation (CLEAR: M = 60.2) was more intelligible in babble noise than normal “conversational” speech (CONV: M = 45.7).
There was also a significant 3-way interaction between Talker group, Communicative condition, and Listener group [F(4,106) = 2.85, p = 0.027]. To unpack this interaction, follow-up 2-way repeated measures ANOVAs between Talker group and Communicative condition were run for each Listener group separately (see Table 1 for a summary of statistical results). The main effect of Talker group was significant in each of the three listener groups, and the follow-up paired-samples t-tests (Bonferroni corrected significance level, p = 0.017) revealed that for both YA- and OANH-listeners, OANH and OAHL talkers were significantly less intelligible than YA talkers (p < 0.001) but did not differ from each other (both comparisons, p ≥ 0.643). Similar results were found in OAHL-listeners, except that OAHL talkers were also significantly less intelligible than both normal hearing groups (all comparisons p ≤ 0.001). The main effect of the Communicative condition was also significant in each of the three listener groups showing an intelligibility gain between the CONV and CLEAR speech (all comparisons, p < 0.001) regardless of listener age and hearing status.
. | YA-listeners . | OANH-listeners . | OAHL-listeners . |
---|---|---|---|
Talker group | F(2,40) = 42.15, p < 0.001 | F(2,30) = 24.83, p < 0.001 | F(2,36) = 29.25, p < 0.001 |
Communicative condition | F(1,20) = 181.11, p < 0.001 | F(1,15) = 24.25, p < 0.001 | F(2,18) = 90.07, p < 0.001 |
Talker group * Communicative condition | F(2,40) = 9.02, p = 0.001 | F(2,30) = 6.45, p = 0.005 | F(2,36) = 3.09, p = 0.058 |
. | YA-listeners . | OANH-listeners . | OAHL-listeners . |
---|---|---|---|
Talker group | F(2,40) = 42.15, p < 0.001 | F(2,30) = 24.83, p < 0.001 | F(2,36) = 29.25, p < 0.001 |
Communicative condition | F(1,20) = 181.11, p < 0.001 | F(1,15) = 24.25, p < 0.001 | F(2,18) = 90.07, p < 0.001 |
Talker group * Communicative condition | F(2,40) = 9.02, p = 0.001 | F(2,30) = 6.45, p = 0.005 | F(2,36) = 3.09, p = 0.058 |
The interaction between Talker group and Communicative condition was significant in both groups with normal hearing (YA, OANH) and failed to reach significance in the OAHL group (see Table 1). These results indicated that only the two listener groups with normal hearing differed in how much the talker groups increased intelligibility of their speech in the CLEAR relative to CONV condition (see Table 1).
For CONV speech, for both YA- and OANH-listener groups, YA talkers were significantly more intelligible than both groups of OA talkers (regardless of hearing status; p ≤ 0.001; Bonferroni corrected significance level p = 0.008). The two OA talker groups did not differ significantly from each other, after correcting for multiple comparisons, for either group (p ≥ 0.043).
For CLEAR speech, this age-related difference, independent of the talker hearing status, remained for both YA- and OANH-listener groups (YA vs OANH/OAHL talkers, p ≤ 0.003 and OANH vs OAHL, p ≥ 0.070). However, how much the different talker groups improved the intelligibility of their speech via speech style modifications (CONV versus CLEAR) differed in YA- and OANH-listeners. For YA-listeners, the CLEAR speech was significantly more intelligible than CONV speech regardless of talker age or hearing status with the largest intelligibility gain achieved by the OAHL talkers (gain 12.1% for YA, 17.4% for OANH, and 23.0% for OAHL talkers; all comparisons, p ≤ 0.001). For the OANH-listeners, while YA and OAHL talkers significantly improved their intelligibility in the CLEAR speech relative to CONV speech (7%, p = 0.004 and 20%, p < 0.001, respectively), in the OANH talkers this comparison was not significant after correcting for multiple comparisons (10%, p = 0.021, Bonferroni corrected significance level p = 0.008).
The regression results reported for YA-listeners (Hazan et al., 2018b) showed that the best predictor of talker intelligibility in babble noise for YA-listeners was the ME1–3 kH measure. In the CONV condition it explained 63.8% of the total variance (p < 0.001), and in the CLEAR condition 44.7% (p < 0.001) with the articulation rate explaining only an additional 8.0% of the variance (CLEAR final model adjusted R2 change = 0.53, p = 0.024). Here, regression analyses are reported for OANH- and OAHL-listeners (N = 30 in all analyses). For OANH-listeners, the final model included ME1–3 kHz for CONV accounting for 61.2% of the total variance (p < 0.001). In CLEAR, ME1–3 kHz accounted for 38.1% with median f0 explaining an additional 14.7% of the variance (final model adjusted R2 change = 0.53, p = 0.004). For OAHL-listeners, the final model in CONV included, again, only ME1-3 kHz accounting for 34.9% of the total variance (p < 0.001). In CLEAR, the final model for OAHL included ME1-3 kHz and articulation rate (final model adjusted R2 change = 0.39, p = 0.19) where ME1-3 kHz accounted for 26.6% (p = 0.002) and articulation rate an additional 12.0% of the variance.
Last, the ICC analysis showed a good reliability across the three different listener groups on which talkers were more/less intelligible for both CONV (ICC = 0.977 with 95% CI = 0.96–0.99; p < 0.001) and CLEAR (ICC = 0.97 with 95% CI = 0.95–0.98; p < 0.001) conditions.
4. Discussion
This study investigated speech intelligibility of YA and OA talkers for young and older listeners. To tease apart effects of age and hearing loss (beyond audibility) we included a group of OAs with mild presbycusis as talkers/listeners.
We found that all listener groups, regardless of age and hearing status, achieved higher intelligibility for younger talkers than for the older talker groups and for clear than for conversational speech. This indicates that talker age is a more important factor than listener age for perceived speech intelligibility. After compensating for reduced audibility, OAs with mild presbycusis generally patterned with the two normal hearing listener groups although the OAHL-listeners showed a high degree of variability in the intelligibility scores. This could indicate that, despite equating performance at a group level, those individuals with poorer hearing (although still “mild” level of loss) need more favourable SNRs than those with better hearing levels in the OAHL group. Therefore, compensating for differences in audibility at an individual level, rather than at a group level, could potentially be a more useful strategy to investigate effects of presbycusis on speech intelligibility in noise. Furthermore, there are several other talker and listener characteristics, in addition to age and hearing acuity, which can impact speech intelligibility which our study did not address. For example, it has been shown that individuals with better speech discrimination acuity also produce more distinct phonemic contrasts (see, e.g., Perkell et al., 2004) indicating that speech perception and production abilities are related within an individual.
The finding that the age-related decline in the intelligibility of speech in background noise is related to the ME1–3 kHz measure ties in with physiological changes associated with advancing age. Weaker and more variable voice production, resulting in a greater spectral tilt, is often observed in older talkers, as was also the case in the current sample of older talkers (see Hazan et al., 2018b). Also, in the current study, speech was mixed in multitalker background babble and boosting mid-frequency speech energy can aid in segregating the target voice from the masker. Interestingly, the predictive power of the ME1–3 kHz measure was weaker for OAHL-listeners than for the normal hearing groups, in both speaking styles, suggesting that listeners with hearing loss may need additional cues (or cue combinations) to aid speech-in-noise processing. This could be because, in addition to reduced audibility of the signal, age-related hearing loss can lead to broadening of auditory filters leading to reduced temporal and spectral resolution.
Despite the relatively strong association between intelligibility and relative energy at the mid-frequency range, it is very unlikely that there is a single acoustic-phonetic correlate of speech intelligibility. Previous studies have shown that several other features (e.g., vowel space measures and articulation rate) can predict perceived intelligibility (Picheny et al., 1986; Bradlow et al., 1996; Smiljanić and Gilbert, 2017). It is possible that with other, more difficult, masker types (e.g., in broadband white noise), adjusting a broader range of acoustic-phonetic features is important for successful speech understanding. Furthermore, different speech elicitation techniques may not necessarily produce similar speech adaptations: read speech without interlocutor feedback can lead to larger and more consistent changes than spontaneous conversational speech (e.g., in speaking rate and f0 median and range; Hazan and Baker, 2010). Therefore, the predictive power of acoustic-phonetic measures may differ between masker types and clear speech elicitation techniques.
Acknowledgments
This work was funded by UK Economics and Social Research Council (Grant No. ES/L007002/1). We thank Lilian Tu for assistance in the data collection.