Clinical tests of cochlear implant (CI) outcomes in sentence recognition cannot fully reflect CI users' self-reported quality of life (QoL). Here, vocal emotion recognition scores, speech reception thresholds (SRTs), and demographic factors were tested as predictors of QoL scores assessed with the Nijmegen Cochlear Implant Questionnaire in postlingually deafened adult CI users. After correction for multiple comparisons, vocal emotion recognition scores were significantly correlated with QoL scores in all subdomains (social interaction, self-esteem, etc.), while SRTs and duration of CI use were not. Vocal emotion recognition may thus be used in clinic to accurately and broadly predict QoL with CIs.
1. Introduction
With the technological advancements over the years, cochlear implants (CIs) have successfully achieved the primary goal to restore hearing sensation and improve speech recognition for profoundly deaf patients. Currently, most CI users have open-set speech recognition in quiet even without lip-reading (Wilson and Dorman, 2008). However, the impact of CIs on the life of profoundly deaf patients is broader than just improved speech recognition. As revealed by self-reported quality of life (QoL), CIs also positively affect self-esteem, daily activities, and social functioning of both postlingually deafened adults (e.g., Hinderink et al., 2000) and prelingually deafened children (e.g., Schorr et al., 2009). Although the QoL scores reflect patients' health, functional capability, and well-being in daily life, they are not routinely obtained to document the treatment efficacy of implants. Instead, clinical measures of CI outcomes in adult patients (see the new Minimum Speech Test Battery distributed by the major CI manufacturers) often involve speech recognition such as CNC word recognition in quiet (Peterson and Lehiste, 1962) and AzBio sentence recognition in noise (Spahr et al., 2012). Such speech recognition measures have not been shown to consistently correlate with QoL scores even in the sound perception or speech hearing subdomains in postlingually deafened adult CI users (e.g., Capretta and Moberly, 2016; Francis et al., 2002; Hinderink et al., 2000; Moberly et al., 2018). It is possible that long-term CI users may have come to accept their stabilized level of speech recognition, and their QoL scores may thus have been less affected by their speech recognition scores. A general consensus in the literature is that QoL with CIs cannot be sufficiently assessed by the speech recognition measures currently used in clinic (e.g., McRackan et al., 2018).
There is a great need for objective outcome measures that can predict subjective QoL scores in CI users. Such objective outcome measures can be used to better assess the progress of CI performance over time, and the benefits of new fitting procedures and speech processing strategies for CI users. Strong correlations with the QoL scores would also indicate that the measured auditory functions play important roles in CI users' daily life. The search for QoL predictors has been expanded to include auditory tests that may better mimic real-life situations (e.g., audiovisual sentence recognition and environmental sound identification), non-auditory measures that may contribute to daily communication with CIs (e.g., working memory, rapid reading, and inhibition-concentration), and tests of demographic factors (e.g., age at testing and duration of CI use). It was found that among these variables, audiovisual sentence recognition performance in quiet, general reasoning ability, and age at testing were related to QoL, but they only explained a small portion of the variability in QoL across postlingually deafened adult CI users (Moberly et al., 2018).
In this study, the auditory measure of vocal emotion recognition was considered as a potential predictor of QoL in postlingually deafened adult CI users. Speech signal in everyday conversation conveys not only linguistic information but also emotional state of the talker. Listeners need correctly perceived vocal emotions to understand the emotional messages and respond appropriately to the talker. Vocal emotion recognition is thus vital to social interaction and communication. Although the increased health-related QoL after implantation may be largely attributable to the improved hearing and emotional health (Francis et al., 2002), CIs do not faithfully preserve the primary acoustic cues for vocal emotion recognition (i.e., the mean and variation of fundamental frequencies) due to the limited spectral and temporal resolution. As such, CI users have to rely on other prosodic cues (e.g., the intensity and duration) to recognize vocal emotions (Luo et al., 2007). One may expect that patients with better vocal emotion recognition would report higher QoL with CIs. In fact, a significant correlation was found between vocal emotion recognition and QoL scores (but not between spondee recognition and QoL scores) in prelingually deafened children with CIs (Schorr et al., 2009). For pediatric CI users, access to the emotional information in child-directed speech is critical for their language and social development (e.g., Dunn et al., 1995). At present, it is unknown whether vocal emotion recognition scores may also predict the subjective QoL in postlingually deafened adult CI users. To fill this knowledge gap, this study tested the correlation between vocal emotion recognition and QoL scores in postlingually deafened adult CI users. For comparison, sentence recognition in noise and demographic factors (i.e., age at testing and duration of CI use) were also tested for their correlations with QoL in postlingually deafened adult CI users. To avoid ceiling or floor effects in sentence recognition of individual CI users at a fixed signal-to-noise ratio (SNR), the SNR was adaptively adjusted to measure the speech reception threshold (SRT) or the SNR for 50% correct word-in-sentence recognition.
2. Methods
2.1 Subjects
Twelve postlingually deafened adult CI users (eight female, four male, mean age: 67 years, age range: 56–73 years) took part in this study. The CI users were a convenient sample of our existing subject pool who participated in a series of studies on emotion and sentence recognition and responded to our request to fill in the QoL questionnaire. Their demographic information is listed in Table 1. All subjects were native English speakers and have used CI for more than three years. They gave informed consent before being tested and were compensated for their participation time. The study was approved by the Institutional Review Board of Arizona State University.
Demographic details of CI users.
Subject . | Age (years) . | Gender . | Etiology . | Processor/strategy (ear) . | Years with CI . |
---|---|---|---|---|---|
CI01 | 73 | Female | Heredity | Harmony/HiRes (R) | 11 |
CI02 | 71 | Female | Mumps/Genetic | Naida Q90/HiRes120 (R) | 15 |
CI03 | 67 | Male | Ischemic Stroke | Rondo/FSP (R) | 12 |
CI05 | 61 | Female | Auditory nerve loss | Naida Q70/HiRes120 (L) | 13 |
CI10 | 70 | Female | Ototoxicity | Naida Q70/HiRes (R) | 13 |
CI12 | 73 | Male | Unknown | Naida Q90/unknown (L) | 10 |
CI14 | 56 | Male | Unknown | Naida Q70/HiRes120 (R) | 16 |
CI15 | 67 | Male | Unknown | Naida Q90/HiRes120 (R) | 4 |
CI16 | 59 | Female | Unknown | Naida Q70/HiRes120 (R) | 11 |
CI17 | 64 | Female | Osteoporosis | Naida Q70/HiRes (L) | 3 |
CI19 | 68 | Female | Meningitis | Kanso CP950/ACE (R) | 13 |
CI24 | 72 | Female | Heredity | Nucleus 6/ACE (R) | 13 |
Subject . | Age (years) . | Gender . | Etiology . | Processor/strategy (ear) . | Years with CI . |
---|---|---|---|---|---|
CI01 | 73 | Female | Heredity | Harmony/HiRes (R) | 11 |
CI02 | 71 | Female | Mumps/Genetic | Naida Q90/HiRes120 (R) | 15 |
CI03 | 67 | Male | Ischemic Stroke | Rondo/FSP (R) | 12 |
CI05 | 61 | Female | Auditory nerve loss | Naida Q70/HiRes120 (L) | 13 |
CI10 | 70 | Female | Ototoxicity | Naida Q70/HiRes (R) | 13 |
CI12 | 73 | Male | Unknown | Naida Q90/unknown (L) | 10 |
CI14 | 56 | Male | Unknown | Naida Q70/HiRes120 (R) | 16 |
CI15 | 67 | Male | Unknown | Naida Q90/HiRes120 (R) | 4 |
CI16 | 59 | Female | Unknown | Naida Q70/HiRes120 (R) | 11 |
CI17 | 64 | Female | Osteoporosis | Naida Q70/HiRes (L) | 3 |
CI19 | 68 | Female | Meningitis | Kanso CP950/ACE (R) | 13 |
CI24 | 72 | Female | Heredity | Nucleus 6/ACE (R) | 13 |
2.2 QoL measure
The health-related QoL of CI users was assessed for not only physical functioning but also psychological and social functioning using the validated Nijmegen Cochlear Implant Questionnaire (NCIQ) (Hinderink et al., 2000). Physical functioning had three subdomains: basic sound perception, advanced sound perception, and speech production, psychological functioning had a single subdomain: self-esteem, while social functioning had two subdomains: activity and social interaction. Each subdomain consisted of ten questions and each subject rated the degree to which the statement in question was true (1-never, 2-sometimes, 3-regularly, 4-usually, and 5-always) or his/her ability to perform the action in question (1-no, 2-poor, 3-moderate, 4-adequate, and 5-good). The responses were converted to scores from 0 to 100 (1 = 0, 2 = 25, 3 = 50, 4 = 75, and 5 = 100). Note that 28 questions were phrased and thus scored in the opposite form (1 = 100, 2 = 75, 3 = 50, 4 = 25, and 5 = 0). In Hinderink et al. (2000), question 53 (when you are in a group, do you feel that your hearing impairment keeps persons from taking you seriously?) was not but should be scored in the opposite form. Also, questions for the speech production subdomain should be switched with those for the advanced sound perception subdomain in Hinderink et al. (2000). After these corrections, the average score was calculated for the completed questions in each subdomain.
2.3 Vocal emotion recognition
Vocal emotion recognition of CI users was assessed using the House Ear Institute Emotional Speech Database (HEI-ESD) (Luo et al., 2007) that included ten semantically neutral English sentences each produced by a male and a female talker in five target emotions (i.e., angry, happy, neutral, sad, and anxious). Each emotional sentence had 3–5 words with a duration ranging from 0.9 to 2.0 s. The 100 tokens were presented in random order at a normalized root-mean-square (RMS) level of 65 dB sound pressure level (SPL) via a JBL loudspeaker in a sound-treated booth. The sampling rate was 22 050 Hz and the resolution was 16 bits. After listening to each token, each subject selected the target emotion of the token by clicking on one of the five response buttons shown on a computer screen. No feedback was provided. Responses were scored in terms of percent correct. Subjects were all tested with a single CI of their own in clinical settings. During the test, bimodal CI users CI01, CI02, CI03, CI10, CI16, and CI17 did not wear their contralateral hearing aid, while bilateral CI users CI12 and CI24 did not wear their second CI. The other subjects were unilateral CI users.
2.4 Sentence recognition in noise
Sentence recognition in noise of CI users was tested using the same sound booth, loudspeaker, single CI, and clinical settings as in the vocal emotion recognition test. Two lists of 20 sentences were randomly selected from the AzBio sentence test (Spahr et al., 2012) for each subject. These English sentences were produced by two male and two female talkers in a conversational style with variable length (3–12 words) and limited contextual cues. The sentences were sampled at 22 050 Hz with a 16-bit resolution. The list equivalency in intelligibility has been validated (Spahr et al., 2012). A 1-down/1-up adaptive procedure was run with each sentence list to measure the SRT. In each trial, a sentence was randomly selected from the list without replacement to be presented in the presence of multi-talker speech babble noise at the target SNR with an overall RMS level of 65 dB SPL. Each subject repeated the sentence he or she heard and was encouraged to guess if unsure. Starting at 15 dB, the SNR was increased or decreased if less or more than half of the words in the sentence were correctly repeated, respectively (Chan et al., 2008). The step size of SNR was 2 and 1 dB during and after the first two reversals of SNR, respectively. The SRT was the mean SNR across the last six reversals. If there were less than six reversals within the 20 trials, the adaptive procedure was rerun with a new, randomly selected AzBio sentence list. For each subject, the SRTs were averaged across two successful runs of the adaptive procedure.
3. Results
The boxplots in Fig. 1 show the QoL scores of CI users in this study in the six subdomains of NCIQ. Since none of the CI users gave more than two responses of “not applicable” in any subdomain, all subjects were included in the data analyses. The QoL scores of 45 CI users in Hinderink et al. (2000) are also plotted in gray for comparison. Separate t-tests showed that our CI users' self-reported QoL scores did not significantly differ from those of Hinderink et al. (2000) in any subdomain (basic sound perception: t55 = 0.91, p = 0.37; advanced sound perception: t55 = 0.25, p = 0.80; speech production: t55 = 0.63, p = 0.53; self-esteem: t55 = 0.01, p = 0.99; activity: t55 = 0.34, p = 0.74; social interaction: t55 = 0.97, p = 0.34). Our CI users' QoL scores in different subdomains were also compared using a one-way repeated-measures analysis of variance, which revealed a significant effect of subdomain on QoL scores (F5,55 = 9.67, p < 0.001). Post hoc Bonferroni t-tests showed significantly higher QoL scores in speech production and activity than in advanced and basic sound perception (p < 0.01). For CI users in this study, Pearson correlation analyses with the Holm-Bonferroni correction revealed that their QoL scores were significantly correlated between most of the subdomains (r > 0.72, p < 0.008), except that the correlations between speech production and social interaction (r = 0.66, p = 0.021), self-esteem and speech production (r = 0.61, p = 0.034), and self-esteem and advanced sound perception (r = 0.56, p = 0.059) did not remain significant following the Holm-Bonferroni correction.
(Color online) Boxplots of QoL scores of CI users in this study in the six subdomains of NCIQ. The boxes show the 25th and 75th percentiles, the error bars show the 10th and 90th percentiles, the horizontal lines in the boxes show the median, and the circles show the outliers. The first three boxes (blue) show the scores for physical functioning, the fourth one (orange) for psychological functioning, and the last two (green) for social functioning. The QoL scores of CI users in Hinderink et al. (2000) are also plotted in gray for comparison, with the squares showing the mean and the error bars showing the standard deviation.
(Color online) Boxplots of QoL scores of CI users in this study in the six subdomains of NCIQ. The boxes show the 25th and 75th percentiles, the error bars show the 10th and 90th percentiles, the horizontal lines in the boxes show the median, and the circles show the outliers. The first three boxes (blue) show the scores for physical functioning, the fourth one (orange) for psychological functioning, and the last two (green) for social functioning. The QoL scores of CI users in Hinderink et al. (2000) are also plotted in gray for comparison, with the squares showing the mean and the error bars showing the standard deviation.
Due to time limitation, subject CI19 did not perform vocal emotion recognition. For CI users who finished the vocal emotion recognition test, their performance ranged from 43% to 77% correct with a mean of 55.71% correct and a standard deviation of 9.56%. Chance level for the vocal emotion recognition test was 20% correct. For all of the CI subjects, their SRTs ranged from 4.17 to 14.84 dB with a mean of 10.44 dB and a standard deviation of 3.29 dB. The top row of Fig. 2 shows CI users' QoL scores in individual subdomains as a function of their vocal emotion recognition scores. Pearson correlation analyses with the Holm-Bonferroni correction showed significant correlations between vocal emotion recognition and QoL scores in all subdomains. As shown in the legend of Fig. 2, the correlation coefficient r ranged from 0.64 (between vocal emotion recognition scores and activity ratings) to 0.83 (between vocal emotion recognition scores and social interaction ratings). The corresponding p value ranged from 0.035 to 0.0015. The bottom row of Fig. 2 shows CI users' QoL scores in individual subdomains as a function of their SRTs. Among the six subdomains of NCIQ, only basic sound perception ratings had a correlation with p < 0.05 with SRTs (r = −0.59, p = 0.043), which, however, was no longer significant after the Holm-Bonferroni correction. The correlation between vocal emotion recognition scores and SRTs missed significance too (r = −0.26, p = 0.45).
CI users' QoL scores in each subdomain as a function of their vocal emotion recognition scores or SRTs (top and bottom rows, respectively). Solid lines show the linear regressions that remained significant after the Holm-Bonferroni correction. The correlation coefficient r and the corresponding p value for each panel are shown in the figure legend (black for correlations that survived the Holm-Bonferroni correction and gray for those that did not survive).
CI users' QoL scores in each subdomain as a function of their vocal emotion recognition scores or SRTs (top and bottom rows, respectively). Solid lines show the linear regressions that remained significant after the Holm-Bonferroni correction. The correlation coefficient r and the corresponding p value for each panel are shown in the figure legend (black for correlations that survived the Holm-Bonferroni correction and gray for those that did not survive).
Two demographic factors of CI users listed in Table 1, namely, the age at testing and the duration of CI use, were also considered as potential predictors of QoL scores. Pearson correlation analyses with the Holm-Bonferroni correction revealed no significant correlations between the age at testing and QoL scores in any subdomain (|r| < 0.23, p > 0.48). For the duration of CI use, its correlations with the QoL scores in most subdomains had p > 0.05 (r < 0.56, p > 0.058), except the one with advanced sound perception ratings (r = 0.64, p = 0.024). However, none of these correlations remained significant following the Holm-Bonferroni correction.
4. Discussion
This study measured the QoL scores, vocal emotion recognition scores, and SRTs for sentence recognition in noise of postlingually deafened adult CI users and found that QoL scores in all subdomains were significantly positively correlated with vocal emotion recognition scores, which supported our hypothesis. In contrast, SRTs and duration of CI use only had moderate correlations with QoL scores in the sound perception subdomains, which did not reach significance following the Holm-Bonferroni correction. These results highlighted the importance of vocal emotion recognition ability to postlingually deafened adult CI users' QoL. Vocal emotion recognition more accurately and broadly predicted the daily experience with CIs than did sentence recognition and should be considered as a useful clinical measure of CI outcomes. Auditory training paradigms, speech processing strategies, and electrical stimulation modes with CIs designed to enhance vocal emotion recognition may also result in higher QoL with CIs. However, it should be noted that the correlation between vocal emotion recognition and QoL scores observed in this study did not necessarily imply a causative relation between the two.
Similar to the original results of NCIQ obtained with old CI technology 18 years ago (Hinderink et al., 2000), our CI users also gave higher ratings for speech production and activity than for advanced and basic sound perception. The speech and language of postlingually deafened adult CI users were well developed before the onset of profound deafness, and their speech production was thus less affected by hearing loss than sound perception. Although advanced and basic sound perception may have improved the most from pre- to post-implantation, they were still the least satisfactory QoL subdomains after implantation (Hinderink et al., 2000). The strong inter-correlations among different QoL subdomains verified the internal consistency of NCIQ test but did not necessarily suggest that the NCIQ test was unidimensional (Hinderink et al., 2000).
The vocal emotion recognition scores of CI users in this study were on average higher than those in Luo et al. (2007) (55.7% and 37.4% correct, respectively). It may be because five out of the eight CI users in Luo et al. (2007) used an old SPEAK processing strategy that had lower spectro-temporal resolution than current strategies. However, CI users in this study still performed much worse than normal-hearing listeners in Luo et al. (2007) and showed a wide range of performance in vocal emotion recognition. Similarly, sentence recognition in noise was also challenging for CI users in this study. No SRTs for AzBio sentence recognition in noise can be found in the literature to compare with the present results, because AzBio sentences were most often presented at a fixed SNR to test percent correct scores (Spahr et al., 2012). Compared to the SRTs for Hearing in Noise Test (HINT) in Srinivasan et al. (2013), the SRTs for AzBio sentence recognition in this study (measured with the same method) were similarly around 10 dB for CI users, despite the different sentence materials. In contrast, normal-hearing listeners can recognize 50% of the words in HINT sentences even at a SNR around −10 dB (Qin and Oxenham, 2003).
The significant correlations between vocal emotion recognition and QoL scores in postlingually deafened adult CI users in this study extended the findings of Schorr et al. (2009) in prelingually deafened children with CIs. While the number of pediatric CI QoL questions in Schorr et al. (2009) was too small to generate scores of different subdomains, the NCIQ test allowed us to separately analyze the QoL ratings in physical, psychological, and social functioning. This advantage in method led to our unique findings that vocal emotion recognition scores were significantly correlated with QoL scores in each of the six subdomains. The highest correlation was between vocal emotion recognition scores and social interaction ratings, which suggested that when interacting with family, friends, neighbors, or even strangers, long-term CI users may find it more challenging to perceive how rather than what is said (at least in quiet). The second highest correlation between vocal emotion recognition scores and speech production ratings may reflect the general relationship between speech perception and production (e.g., Edwards, 1974). In specific, the NCIQ test asked about CI users' ability to control the pitch and volume of their voice and to make their voice sound angry, friendly, and sad, which would likely require a good perception of vocal emotions. Similar to the relation of vocal emotion recognition scores to social interaction ratings, CI users with better vocal emotion recognition scores were more confident in making contact with others and following the conversation, as revealed by the correlation between vocal emotion recognition scores and self-esteem ratings. It is not surprising that as a specific measure of sound perception, vocal emotion recognition had scores also correlated with sound perception ratings. The ratings of advanced sound perception (e.g., vocal communication in noise, music perception, voice discrimination, etc.) had a higher correlation with vocal emotion recognition scores than the ratings of basic (environmental) sound perception, possibly because the advanced listening tasks and vocal emotion recognition used similar pitch cues and cognitive functions. Finally, ratings of general activities such as working, studying, driving, and shopping were not as strongly correlated with vocal emotion recognition scores as the other QoL subdomains. One of the goals for developing the AzBio sentence recognition test was to provide an outcome measure consistent with the patients' perception of their performance in daily life (Spahr et al., 2012). However, this study found that the SRTs for AzBio sentence recognition in noise were only moderately correlated with the subjective ratings of basic sound perception. Instead, vocal emotion recognition may be used to reflect the social and psychological aspects of QoL missed by sentence recognition.
The present results are limited to emotion and sentence recognition with a single CI in a small number of CI users. Future studies should measure the predictive power of vocal emotion recognition scores with clinically assigned unilateral, bimodal, or bilateral CIs for self-reported QoL ratings in a larger CI subject sample. Other patient populations such as prelingually deafened CI users, newly implanted CI users, and even CI users with single-sided deafness should be tested. It will also be interesting to conduct a longitudinal study to look at vocal emotion recognition and QoL scores, as well as their relationship before implantation and at different time points following implantation.
Acknowledgments
We are grateful to all subjects for their participation in this study. Research was supported in part by the Arizona State University Institute for Social Science Research.