The effect of instrument timbre (e.g., piano, trumpet, organ, and violin) on musical emotion recognition was tested in normal-hearing (NH) listeners and cochlear implant (CI) users. NH performance was best with the piano, and did not change when melodies were normalized with a fixed tempo. CI performance was significantly better with the piano and trumpet than with the violin and organ when both tempo and mode cues were preserved, but not for the tempo-normalized melodies. The sharper temporal onsets of piano and trumpet may enhance CI users' perception of tempo cues important for musical emotion recognition.

Music can effectively convey emotions (e.g., Gabrielsson and Juslin, 1996). For example, background music is often used in movies and social events to elicit the desired emotions in audience and participants. Distinctive musical features are used to convey the two basic emotions of happy (major mode/fast tempo) and sad (minor mode/slow tempo). Major and minor modes are different scales or subsets of musical pitches (e.g., a minor third is one semitone less than a major third). The use of mode and tempo cues in musical emotion recognition has been studied in normal-hearing (NH) listeners and profoundly deaf people with cochlear implants (CIs). Hopyan et al. (2014) not only used happy and sad melodies with the original mode and tempo cues but also modified the melodies to have a normalized tempo or the opposite mode (i.e., major to minor mode or vice versa). Musical emotion recognition with the original and modified melodies showed that NH children were more sensitive to mode change, while CI children were more sensitive to tempo normalization. Caldwell et al. (2015) created novel melodies with both congruent (i.e., major mode/fast tempo and minor mode/slow tempo) and incongruent cues (i.e., major mode/slow tempo and minor mode/fast tempo). The happy–sad ratings of these melodies by NH adults were affected by both mode and tempo cues, while those by CI adults were only affected by tempo cues. The abnormally greater reliance on tempo than mode cues by CI users was due to the fact that CIs preserve robust temporal cues but do not faithfully encode pitch cues for mode perception (e.g., Luo et al., 2014).

Various instruments are used to play music. Musical notes of the same pitch and loudness but different instruments are discriminated based on timbre cues. The primary acoustic correlates of timbre dimensions are the temporal (e.g., attack time) and spectral envelope cues (e.g., spectral centroid) for both NH listeners and CI users (e.g., Kong et al., 2011; McAdams et al., 1995), although CI users have poorer-than-normal instrument identification (Gfeller et al., 2002). Despite their mutually exclusive definitions, timbre and pitch interact with each other in both NH listeners and CI users (Luo et al., 2019). For example, an increase in timbre sharpness from cello to violin may be confused with a pitch increase. Spectral features of instruments have also been shown to affect pitch-based melodic contour identification (MCI) of NH listeners and CI users (Galvin et al., 2008). Although variable across subjects, CI users' MCI was overall better with the organ than with the piano, possibly because the piano had a denser stimulation pattern than the organ. After CI processing, the high-frequency harmonics of the piano may generate melodic contour cues inconsistent with the low-frequency harmonics (Galvin et al., 2008).

Instrument timbre may also affect musical emotion recognition (Gabrielsson and Juslin, 1996). For example, after controlling for the musical structural, performance, and familiarity factors, Hailstone et al. (2009) found that NH listeners' recognition of happy emotion was poorer with the violin than with the piano and trumpet. However, the effect of instrument timbre on musical emotion recognition with CIs remains unknown, because CI users' musical emotion recognition has only been tested with piano melodies (e.g., Hopyan et al., 2014; Caldwell et al., 2015). To fill the knowledge gap, in this study the happy and sad melodies from Hailstone et al. (2009) were played with the piano, trumpet, organ, and violin to test musical emotion recognition in CI users and NH controls. The MCI results from Galvin et al. (2008) suggest that CI users may better perceive mode cues for musical emotion recognition with the organ than with the piano. However, compared to the violin and organ, the piano and trumpet have sharper temporal onsets that may facilitate the perception of tempo cues for musical emotion recognition. To test these hypotheses, the emotional melodies were also played with a fixed intermediate tempo. If the effect of instrument timbre on musical emotion recognition remains similar with tempo normalization, then such an effect is unlikely due to the use of tempo cues.

The study group included nine post-lingually deafened older adult CI users (two males, age range: 50–78 years, mean age: 64 years) with limited to no musical training. Table 1 lists their demographic details. Clinical CI processors and program settings were used for each tested ear. No hearing aid or CI was used in the non-tested ear. The non-tested ear was also plugged for those with residual acoustic hearing. The control group included eight young adult NH listeners (two males, age range: 20–33 years, mean age: 26 years) with hearing thresholds better than 20 dB hearing level at octave frequencies from 250 to 8000 Hz in both ears. These NH listeners had various amounts (0–10 years) of musical training in childhood and adolescence but were not active musicians. All subjects gave informed consent and were compensated for their participation. The study was approved by the local Institutional Review Board.

Table 1.

Demographics of CI subjects. ACE = Advanced Combination Encoding.

SubjectAge (years)GenderEtiologyProcessor/Strategy (Ear)Years with CI
CI10 73 Female Ototoxicity Naida Q70/HiRes (R) 16 
CI14 59 Male Unknown Naida Q70/HiRes120 (R) 18 
CI25 65 Female Unknown Nucleus 6/ACE (R) 
CI26 50 Female Genetic Nucleus 7/ACE (L) 
CI31 78 Female Hereditary Nucleus 6/ACE (L) 
CI32 63 Female Unknown Naida Q90/HiRes (L) 
CI33 52 Male Meniere's Nucleus 7/ACE (R) 
CI34 77 Female Unknown Nucleus 7/ACE (R) 
CI35 60 Female Genetic Nucleus 6/ACE (R) 11 
SubjectAge (years)GenderEtiologyProcessor/Strategy (Ear)Years with CI
CI10 73 Female Ototoxicity Naida Q70/HiRes (R) 16 
CI14 59 Male Unknown Naida Q70/HiRes120 (R) 18 
CI25 65 Female Unknown Nucleus 6/ACE (R) 
CI26 50 Female Genetic Nucleus 7/ACE (L) 
CI31 78 Female Hereditary Nucleus 6/ACE (L) 
CI32 63 Female Unknown Naida Q90/HiRes (L) 
CI33 52 Male Meniere's Nucleus 7/ACE (R) 
CI34 77 Female Unknown Nucleus 7/ACE (R) 
CI35 60 Female Genetic Nucleus 6/ACE (R) 11 

Musical emotion recognition was tested using a subset of the novel happy and sad melodies composed by Hailstone et al. (2009). Each melody had two four-bar phrases in a two-octave range in the treble clef. The fundamental frequency ranged from 196 to 932 Hz across all the melodies and fell in the typical CI frequency map (200–8000 Hz). There were ten pairs of happy and sad melodies. Each pair had the same melodic contour but different modes (happy: major mode; sad: minor mode) and tempi [happy: mean beats per minute (BPM) = 189 (range = 156–257 BPM); sad: mean BPM = 86 (range = 68–100 BPM)]. The rhythm was isochronous across the different tempi. These melodies were produced with a 44 100-Hz sampling rate and a 16-bit resolution using the piano, trumpet, organ, and violin in Musescore 2.0 (https://musescore.org/en). The four instruments from different instrument families (i.e., pitched percussion, brass, woodwind, and string) had unique spectra-temporal features. Figure 1 shows that the piano and trumpet had sharper temporal onsets than the violin and organ. Different from Galvin et al. (2008), the violin had denser high-frequency stimulation than the other instruments. The stimulation pattern of the trumpet at high frequencies was consistent with the pitch changes across notes, while those of the piano and organ had inconsistent pitch cues. For example, for the piano and organ, the fourth note had more high-frequency stimulation than the third note with a higher fundamental frequency.

Fig. 1.

(Color online) Top: Waveforms (gray), 50-Hz amplitude envelopes (white), and fundamental frequency (F0) contours (black) of the first four notes E, G, G, and F of a happy melody played with the piano, trumpet, organ, and violin. The left axis corresponds to the waveforms and amplitude envelopes, and the right axis corresponds to F0. Bottom: Corresponding electrodograms (i.e., stimulation patterns on each electrode) for the instruments and notes shown in the top panel. Electrodograms were generated using the default ACE strategy of the Nucleus CI device (e.g., 22 channels, 8 maxima, and 900 pulses per second per channel). Note that basal electrodes 1–10 were not stimulated for these musical notes.

Fig. 1.

(Color online) Top: Waveforms (gray), 50-Hz amplitude envelopes (white), and fundamental frequency (F0) contours (black) of the first four notes E, G, G, and F of a happy melody played with the piano, trumpet, organ, and violin. The left axis corresponds to the waveforms and amplitude envelopes, and the right axis corresponds to F0. Bottom: Corresponding electrodograms (i.e., stimulation patterns on each electrode) for the instruments and notes shown in the top panel. Electrodograms were generated using the default ACE strategy of the Nucleus CI device (e.g., 22 channels, 8 maxima, and 900 pulses per second per channel). Note that basal electrodes 1–10 were not stimulated for these musical notes.

Close modal

The computer-generated melodies avoided human performance variables and provided precise control of the mode and tempo cues. To remove the tempo cues for musical emotion recognition, the melodies were also generated with a fixed intermediate tempo of 130 BPM. The original and tempo-normalized melodies were tested in separate conditions in counterbalanced order. Each instrument was tested in a separate block, and the four instrument blocks were randomized within and across subjects. In each trial of a block, a melody was randomly selected without replacement from the 20 melodies (two emotions × ten melodies) of the instrument. The melody was presented via a loudspeaker in a sound booth at a root-mean-square level of 65 dBA. After listening to each melody, the subject was asked to choose the emotion of the melody by clicking on one of the two buttons marked with “happy” and “sad” on a monitor without feedback.

Figure 2 shows the percent correct scores of musical emotion recognition in CI and NH subjects listening to the original and tempo-normalized melodies as a function of the instrument. To reduce ceiling effects, the percent correct scores were transformed into rationalized arcsine units (RAUs; Studebaker, 1985) before statistical analyses. Data were then analyzed using a mixed-design analysis of variance (ANOVA) with tempo condition (original, normalized) and instrument (piano, trumpet, organ, violin) as the within-subject factors and listener group (CI, NH) as the between-subject factor. Significant effects were observed for tempo condition (F1,15 = 15.59, p = 0.001), instrument (F3,45 = 14.09, p < 0.001), and listener group (F1,15 = 41.77, p < 0.001). Significant interactions were found between tempo condition and instrument (F3,45 = 3.39, p = 0.03), between instrument and listener group (F3,45 = 3.24, p = 0.03), between tempo condition, instrument, and listener group (F3,45 = 4.15, p = 0.01), but not between listener group and tempo condition (F1,15 = 2.39, p = 0.14).

Fig. 2.

Musical emotion recognition scores of CI users (left panel) and NH listeners (right panel) listening to the original (black bars) and tempo-normalized melodies (white bars) with the piano, trumpet, organ, and violin. Vertical bars represent the mean while error bars represent the standard error across subjects. The lower end of the y axis indicates the chance level of 50% correct.

Fig. 2.

Musical emotion recognition scores of CI users (left panel) and NH listeners (right panel) listening to the original (black bars) and tempo-normalized melodies (white bars) with the piano, trumpet, organ, and violin. Vertical bars represent the mean while error bars represent the standard error across subjects. The lower end of the y axis indicates the chance level of 50% correct.

Close modal

To better understand the interactions between tempo condition and instrument on musical emotion recognition, the RAU scores were separately analyzed in CI users and NH listeners using two-way repeated-measures ANOVAs. For CI users, there were significant effects of both tempo condition (F1,8 = 13.33, p = 0.006) and instrument (F3,24 = 11.62, p < 0.001), as well as a significant interaction (F3,24 = 5.75, p = 0.004). Post hoc Bonferroni t-tests showed that CI users performed significantly better when listening to the original melodies than the tempo-normalized melodies with the piano and trumpet (p < 0.004), but not with the violin and organ (p > 0.08). CI performance was significantly better with the piano and trumpet than with the violin and organ for the original melodies (p < 0.01). CI performance was not significantly different between the original piano and trumpet melodies (p = 0.14) or between the original violin and organ melodies (p = 1.0). With the tempo-normalized melodies, CI performance was not significantly different among the instruments (p > 0.26 for all pairwise comparisons). Binomial tests with a 50% chance level and 20 trials (i.e., two emotions × ten melodies) showed that CI users performed significantly above chance only with the original piano and trumpet melodies. CI performance was ∼50% correct with the tempo-normalized organ melodies and ∼60% correct in the other conditions. Mean confusion matrices of CI users showed that happy emotion was more often chosen and better recognized than sad emotion (59% vs 41% responses and 93% vs 76% correct) with the original piano melodies. The response bias toward happy emotion was greatly reduced with the original trumpet and organ melodies. In contrast, CI users more often chose and better recognized sad than happy emotion (68% vs 32% responses and 78% vs 42% correct) with the original violin melodies. Similar response patterns with the various instruments were observed for the tempo-normalized melodies, although the overall response accuracy decreased.

For NH listeners, there was a significant effect of instrument (F3,21 = 5.69, p = 0.005), but not for tempo condition (F1,7 = 2.63, p = 0.15); there was no significant interaction (F3,21 = 1.28, p = 0.31). Post hoc Bonferroni t-tests showed that performance of NH listeners was significantly better only with the piano (p < 0.03 for all pairwise comparisons). Binomial tests showed that NH listeners performed significantly above chance with any instrument in any tempo condition. On average, NH listeners showed symmetrical confusion matrices with similar recognition scores for both happy and sad emotions with the different instruments in each tempo condition.

Pearson correlations with the Bonferroni correction showed that with the original melodies, there was no correlation between CI performance with any instrument and age at testing or CI experience (p > 0.30 in all cases). The difference between CI performance with the original and tempo-normalized piano melodies was calculated to indicate the reliance on tempo cues and was found to have a borderline correlation with CI experience (r = 0.67, p = 0.049). The amount of musical training in NH listeners was significantly correlated with their performance when listening to the original melodies with the organ (r = 0.93, p = 0.0007), but not with the piano, violin, or trumpet (p > 0.05 in all cases). It is strange that the correlation was only significant with the organ, because for those NH listeners who had musical training before, their training was mostly with the piano.

NH listeners scored ∼90% correct when recognizing the happy and sad emotions conveyed by the original melodies, which confirmed the validity of stimuli and task in this study. Similar to previous findings (e.g., Caldwell et al., 2015; Hopyan et al., 2014), the high-level NH performance was likely due to mode but not tempo cues, since tempo normalization had no significant effect for NH listeners. The instrument timbre similarly affected NH listeners' recognition of happy and sad emotions, with the piano producing better performance than the other instruments in this study. In contrast, Hailstone et al. (2009) showed poorer recognition of happy emotion only with the violin, compared to the other instruments. The different results may be because the piano, violin, and trumpet tested by Hailstone et al. (2009) were not exactly the same as ours and they also tested angry and fearful emotions. The effect of instrument timbre on NH performance in this study did not change with tempo normalization. Better NH performance with the piano was thus not resulted from the sharper temporal onsets of piano that may enhance tempo cues. A few NH listeners had musical training with the piano before. Their familiarity with the piano may help them better perceive the mode cues for musical emotion recognition in piano melodies.

In general, musical emotion recognition was significantly poorer in CI users than in NH listeners. CI users and NH listeners exhibited different effects of instrument timbre and tempo normalization. CI performance was well above chance with the piano and trumpet, but near chance with the violin and organ for the original melodies. This was unlikely due to mode cues, because with CI processing, the piano and organ should have produced less robust pitch cues for mode perception than the trumpet (Fig. 1). Instead, the CI results with the original melodies agreed with our hypothesis that the sharper temporal onsets of piano and trumpet may allow for better perception of tempo cues for musical emotion recognition, compared to the violin and organ. This explanation of instrument timbre effect based on tempo cues was further supported by the effect of tempo normalization. The fixed intermediate tempo created ambiguous tempo cues for musical emotion recognition. CI performance with the piano and trumpet thus dropped to chance level, similar to that with the violin and organ. CI users also had a response bias toward happy and sad emotions with the piano and violin, respectively. NH listeners had a much better overall performance and did not exhibit such response bias. A faster attack of the piano has been associated with happiness, while a slower attack of the violin has been associated with sadness (Gabrielsson and Juslin, 1996; Hailstone et al., 2009). CI users with poorer spectral resolution than NH listeners may be more susceptible to these effects.

The novel finding of this study, i.e., the effect of instrument timbre on CI users' musical emotion recognition, has important implications for evaluating and enhancing CI performance in this task. Previous data with piano melodies may have underestimated the challenges of musical emotion recognition with CIs in real life, where music is played with various instruments. Future studies should confirm that the better perception of tempo cues for musical emotion recognition with the piano and trumpet in this study was due to the sharper temporal onsets of such instruments.

We are grateful to all subjects for their participation in this study. Research was supported by the Arizona State University Institute for Social Science Research.

1.
Caldwell
,
M.
,
Rankin
,
S. K.
,
Jiradejvong
,
P.
,
Carver
,
C.
, and
Limb
,
C. J.
(
2015
). “
Cochlear implant users rely on tempo rather than on pitch information during perception of musical emotion
,”
Cochlear Implants Int.
16
,
s114
s120
.
2.
Gabrielsson
,
A.
, and
Juslin
,
P. N.
(
1996
). “
Emotional expression in music performance: Between the performer's intention and the listener's experience
,”
Psychol. Music
24
,
68
91
.
3.
Galvin
,
J. J.
,
Fu
,
Q.-J.
, and
Oba
,
S.
(
2008
). “
Effect of instrument timbre on melodic contour identification by cochlear implant users
,”
J. Acoust. Soc. Am.
124
,
EL189
EL195
.
4.
Gfeller
,
K.
,
Witt
,
S.
,
Mehr
,
M.
,
Woodworth
,
G.
, and
Knutson
,
J. F.
(
2002
). “
Effects of frequency, instrumental family, and cochlear implant type on timbre recognition and appraisal
,”
Ann. Otol. Rhinol. Laryngol.
111
,
349
356
.
5.
Hailstone
,
J. C.
,
Omar
,
R.
,
Henley
,
S. M. D.
,
Frost
,
C.
,
Kenward
,
M. G.
, and
Warren
,
J. D.
(
2009
). “
It's not what you play, it's how you play it: Timbre affects perception of emotion in music
,”
Q. J. Exp. Psychol.
62
,
2141
2155
.
6.
Hopyan
,
T.
,
Manno
,
F. A.
 III
,
Papsin
,
B. C.
, and
Gordon
,
K. A.
(
2014
). “
Sad and happy emotion discrimination in music by children with cochlear implants
,”
Child Neuropsychol.
22
,
366
380
.
7.
Kong
,
Y.
,
Mullangi
,
A.
,
Marozeau
,
J.
, and
Epstein
,
M.
(
2011
). “
Temporal and spectral cues for musical timbre perception in electric hearing
,”
J. Speech, Lang. Hear. Res.
54
,
981
995
.
8.
Luo
,
X.
,
Masterson
,
M. E.
, and
Wu
,
C.-C.
(
2014
). “
Melodic interval perception by normal-hearing listeners and cochlear implant users
,”
J. Acoust. Soc. Am.
136
,
1831
1844
.
9.
Luo
,
X.
,
Soslowsky
,
S.
, and
Pulling
,
K. R.
(
2019
). “
Interaction between pitch and timbre perception in normal-hearing listeners and cochlear implant users
,”
J. Assoc. Res. Otolaryngol.
20
,
57
72
.
10.
McAdams
,
S.
,
Winsberg
,
S.
,
Donnadieu
,
S.
,
De Soete
,
G.
, and
Krimphoff
,
J.
(
1995
). “
Perceptual scaling of synthesized musical timbres: Common dimensions, specificities, and latent subject classes
,”
Psychol. Res.
58
,
177
192
.
11.
Studebaker
,
G. A.
(
1985
). “
A ‘rationalized’ arcsine transform
,”
J. Speech Hear. Res.
28
,
455
462
.