Fear is a frequently studied emotion category in music and emotion research. However, research in music theory suggests that music can convey finer-grained subtypes of fear, such as terror and anxiety. Previous research on musically expressed emotions has neglected to investigate subtypes of fearful emotions. This study seeks to fill this gap in the literature. To that end, 99 participants rated the emotional impression of short excerpts of horror film music predicted to convey terror and anxiety, respectively. Then, the excerpts that most effectively conveyed these target emotions were analyzed descriptively and acoustically to demonstrate the sonic differences between musically conveyed terror and anxiety. The results support the hypothesis that music conveys terror and anxiety with markedly different musical structures and acoustic features. Terrifying music has a brighter, rougher, harsher timbre, is musically denser, and may be faster and louder than anxious music. Anxious music has a greater degree of loudness variability. Both types of fearful music tend towards minor modalities and are rhythmically unpredictable. These findings further support the application of emotional granularity in music and emotion research.
I. INTRODUCTION
Fear is a fundamental emotion that crucially influences our daily lives. It plays an important role in risk assessment in daily decision-making, is a critical motivator for behavior, and a powerful manipulator of thought patterns and beliefs, such as those surrounding vaccinations and safety protocols during the COVID-19 pandemic (Harper et al., 2021). Fear-based mental disorders (e.g., anxiety, panic disorders) are overwhelmingly common plaguing millions worldwide each day (Yang et al., 2021). Humans communicate fear through their voice, facial expressions, and body movements, but also through art forms such as music. Anyone who has seen a scary movie, played a scary video game, or participated in a haunted house understands the ability of music to convey fear. Consider, for example, the well-known shrieking violins in the “The Murder” cue that Bernard Herrmann wrote for the famous shower murder scene in Alfred Hitchcock's film Psycho [Hitchcock (1960)]. In fact, in their review of 41 studies on emotional expression in music performance, Juslin and Laukka (2003) found evidence that professional musicians are generally able to communicate fear (as well as happiness, anger, tenderness, and sadness) about as effectively as vocal and facial expressions (Juslin, 2019). In music and emotion research, one of the most frequently studied emotional categories is “fear”1 (Juslin, 2019; Juslin and Laukka, 2003; Warrenburg, 2020a). For example, Warrenburg (2020a) reviewed the emotion terms used in 306 music and emotion studies and found that “fear” was the 6th-most popular term (following “sad,” “happy,” “anger,” “relaxed,” and “chills/pleasure”) with 332 instances.
How exactly does music convey fear? Juslin (2019) summarizes what is known to-date in music and emotion research about musically conveyed fear. In his review, he found that music typically communicates fear with fast tempi, minor and dissonant tonalities, ascending and wide-ranging pitches, staccato articulations, soft attacks, jerky and unpredictable rhythms, soft timbres, narrow and fast vibrato, and a large amount of variation in tempo, sound level, articulation, and timing (see the second column of Table I for the full list) (Juslin, 2019). These musical characteristics might successfully convey fear in part by mimicking acoustic elements of frightening vocal or natural sounds (Huron, 2015; Juslin and Laukka, 2003). For example, through acoustic analyses, Trevor et al. (2020) found that “scream-like” music underscoring terrifying scenes in horror films contains an acoustic feature unique to human screams, called “roughness,” perhaps aiding the music in effectively communicating fear.
This table shows how the findings of Juslin (2019) regarding how music portrays fear compare to the descriptions of McClelland (2012, 2014, 2017b) of ombra and tempesta. The first column lists categories of musical descriptors, the second contains musical characteristics that convey fear, and the third and fourth column detail whether or not each feature is also characteristic of ombra or tempesta. The highlighted rows represent features which are only characteristic of one or the other, not both. The table overall demonstrates how current findings on musically portrayed fear are over-generalized. Research on how music expresses finer-grained subtypes of fear, such as anxiety and terror, is warranted.
Category . | Fear . | ombra . | tempesta . |
---|---|---|---|
(anxiety, dread) . | (terror, panic) . | ||
Tempo | fast tempo | ✓ | |
large tempo variability | ✓ | ✓ | |
Tonality | minor mode | ✓ | ✓ |
dissonance | ✓ | ✓ | |
Dynamics | low sound level | ✓ | |
large sound level variability | ✓ | ✓ | |
rapid changes in sound level | ✓ | ✓ | |
Pitch | high pitch | ✓ | |
ascending pitch | |||
very wide pitch range | ✓ | ✓ | |
large pitch contrasts | ✓ | ✓ | |
micro-structural irregularitya | ✓ | ✓ | |
Articulation | staccato articulation | ||
large articulation variability | |||
soft tone attacks | |||
Rhythm | jerky rhythms | ✓ | ✓ |
very large timing variability | ✓ | ✓ | |
pauses | ✓ | ✓ | |
Timbre | soft timbre | ✓ | |
fast vibrato rate | |||
small vibrato extent |
Category . | Fear . | ombra . | tempesta . |
---|---|---|---|
(anxiety, dread) . | (terror, panic) . | ||
Tempo | fast tempo | ✓ | |
large tempo variability | ✓ | ✓ | |
Tonality | minor mode | ✓ | ✓ |
dissonance | ✓ | ✓ | |
Dynamics | low sound level | ✓ | |
large sound level variability | ✓ | ✓ | |
rapid changes in sound level | ✓ | ✓ | |
Pitch | high pitch | ✓ | |
ascending pitch | |||
very wide pitch range | ✓ | ✓ | |
large pitch contrasts | ✓ | ✓ | |
micro-structural irregularitya | ✓ | ✓ | |
Articulation | staccato articulation | ||
large articulation variability | |||
soft tone attacks | |||
Rhythm | jerky rhythms | ✓ | ✓ |
very large timing variability | ✓ | ✓ | |
pauses | ✓ | ✓ | |
Timbre | soft timbre | ✓ | |
fast vibrato rate | |||
small vibrato extent |
Small variations in pitch and sound level (Cespedes-Guevara and Eerola, 2018).
However, research on musically expressed fear has been missing a crucial consideration: emotional granularity. Emotional granularity refers to an individual's capacity to recognize, in oneself and in others, finer-grained emotional states that may be similar to each other, and to communicate these distinctive emotional states with targeted terminology (Barrett, 2004, 2017; Warrenburg, 2019a,b, 2020b). Researchers have only just started to examine musically expressed subtypes of the basic emotions frequently studied. For example, Warrenburg (2020b) recently distinguished between two subtypes of sad emotions in music: melancholy and grief. She also called for more consideration of emotional granularity in designing future music and emotion studies in order to correct any previously formed misconceptions or inconsistencies about how music conveys emotions due to experimental designs that conflated finer-grained emotional states (Warrenburg, 2019a,b).2
There is considerable musical and psychological evidence that music may convey yet-unexplored subtypes of fear. Returning to the example of Psycho (1960), compare the previously discussed “The Murder” cue (YouTube, 2023a) to the suspenseful music Herrmann wrote for the scene where detective Arbogast creeps through Bates' house [i.e., “The Stairs” cue: (YouTube, 2023b)]. To date, most music and emotion researchers would likely classify both types of music as music that conveys fear even though there are pronounced musical and acoustic differences between them. In a branch of Music Theory called Topic Theory,3 however, these two types of music would indeed be classified separately based on differences between their collective musical features. McClelland (2012, 2014, 2017a,b) divides scary classical and contemporary film music into two distinct types, or topics: ombra and tempesta. Ombra refers to the common collection of musical features that appear in music written to underscore ghost and witch scenes, melodramas on supernatural subjects, or any scenes requiring a mysterious, suspenseful atmosphere (McClelland, 2012, 2014). McClelland (2012, 2014) describes ombra as somber and gloomy in style, having a slow or moderate pace, and containing melodies that are often exclamatory and fragmented with restless motion. The combination of a serious, dark mood with unnerving, unpredictable entrances and rhythms communicates anxiety and dread (McClelland, 2012, 2014). In contrast, the common collection of musical features used in music underscoring scenes involving storms, floods, earthquakes, or conflagrations is referred to as the topic tempesta (McClelland, 2014, 2017b). Such scenes often involve flight or pursuit, panic, or metaphorical depictions of rage or madness (McClelland, 2014, 2017b). Through the use of fast tempi, unusual modulations, fragmented melodies, and very wide melodic leaps, tempesta is agitated and stormy in style and generally communicates feelings of terror (McClelland, 2014, 2017b).4
Notably, several of the features that Juslin (2019) summarizes as communicative of fear are characteristic of tempesta, while others are characteristic of ombra. Table I demonstrates which features map onto which of the two topics. The highlighted rows depict features that are distinctly different between the two topics. For example, Juslin (2019) notes that music portrays fear through the use of fast tempi and high pitches, both of which are characteristic of tempesta (McClelland, 2014, 2017b). However, ombra consists of slow or moderate tempi and of lower pitches (McClelland, 2012, 2014). On the other hand, Juslin (2019) notes that music portrays fear with lower sound levels and soft timbres, similarly to ombra, which is generally quieter and employs darker or softer timbres (McClelland, 2012, 2014). Tempesta, contrastingly, uses rougher and brighter timbres and is notably louder than ombra (McClelland, 2014, 2017b). Additionally, some of the characteristics Juslin (2019) lists as conveying fear are not mentioned by McClelland (2012, 2014, 2017b) (signified by blank squares in the rightmost columns in Table I), and some features of ombra and tempesta are not mentioned by Juslin (2019), such as the unusual tonal modulations, the bold, unpredictable, chromatic harmonic motions, and the fragmented, disjunct melodic motions that are characteristic of both topics. These different accounts of fearful music provide strong evidence that more research is needed on musically conveyed subtypes of fear. More specifically, we argue that McClelland's research suggests that scary music conveys at least two subtypes of fear: terror and anxiety.
Current psychological theories on fear support such a distinction between subtypes or varieties of fear (Adolphs, 2013; Adolphs et al., 2019; LeDoux, 2014; Mobbs et al., 2019; Perkins et al., 2012). Subtypes of fear can be functionally differentiated: panic or terror is associated with attention and reaction to an immediate threat whereas anxiety implies a future threat to be dealt with through planning and prediction behaviors (Adolphs, 2013; Mobbs et al., 2019). The discovery that different neural networks process subtypes of fear further supports these functional differences (Adolphs, 2013; Gross and Canteras, 2012; Mobbs et al., 2019). For example, as described by Adolphs (2019) in a recent interview on the state of fear research, a neural circuit involving the periaqueductal gray and the superior colliculus has been found to mediate fear behaviors in rodents who spot aerial predators (Evans et al., 2018) while another region often active in fear states, the ventromedial hypothalamus, has not been found to respond to sightings of aerial predators (Kunwar et al., 2015; Lin et al., 2011). Using functional magnetic resonance imagery (fMRI) and a simulated maze paradigm in which predators pursued participants threatening to give them an unpleasant, yet harmless, electric shock, Mobbs et al. (2007, 2009) found activations in forebrain areas, the subgenual anterior cingulate cortex (sgACC), hippocampus, and amygdala in reaction to the detection of distant threats, while avoidance of threats that were closer in proximity activated areas in the midbrain and the mid-dorsal ACC. Given the functional differences between anxiety (a response to a possible future threat) and terror or panic (a response to an immediate, known threat), it is possible to interpret these results as reflective of different neural networks for processing these separate subtypes of fear (Perkins et al., 2012).
Humans also express subtypes of fear in markedly different ways (Kumar and Mohanty, 2016; Perkins et al., 2012). For example, we display a different facial expression for anxiety (characterized by environmental scanning behaviors such as head swivels and eye darts) than for terror [characterized by staring straight ahead (Perkins et al., 2012)]. We also express anxiety and terror differently with our voices by using a higher fundamental frequency for terror (Kumar and Mohanty, 2016). Despite this multidisciplinary evidence, subtypes of fear have yet to be investigated in music and emotion research. For example, in a review of emotion terms used in music and emotion research, Warrenburg (2020a) found no use of the term “terror” and only 16 uses of the term “anxiety.” Our study aims to fill this gap in music and emotion research by investigating musically expressed subtypes of fear.
While many databases of emotional musical excerpts exist [e.g., Eerola and Vuoskoski (2011), Paquette et al. (2013), Vieillard et al. (2008), and Warrenburg (2021)], none of them distinguish between terror and anxiety. Therefore, to answer our research questions, we elected to create our own large database of ecologically valid musical excerpts that communicate terror and anxiety, respectively. To create and validate our database, we used a method similar to that of Eerola and Vuoskoski (2011). We curated excerpts from horror film soundtracks using an expertise-based approach and then recruited participants to rate the excerpts along several emotion rating scales (both dimensional and discrete). Given the previously discussed evidence that music expresses subtypes of fear markedly differently (McClelland, 2012, 2014, 2017b), we predicted that participants would rate the musical excerpts curated from horror film soundtracks in accordance with their target emotions (terror or anxiety) on discrete emotion rating scales. Additionally, we predicted that with dimensional (valence and arousal) emotion rating scales, participants would rate the terrifying musical excerpts as communicating a more negative valence and a higher arousal than the anxious musical excerpts due to the predicted heightened intensity and dissonance of the musical features associated with terror as compared to anxiety [e.g., louder dynamics, higher pitch, etc. (McClelland, 2012, 2014, 2017b)].
After testing these hypotheses, we then used the data to filter out the most successful excerpts at portraying terror and anxiety to create FEARMUS: a new battery of fearful musical stimuli. Specifically, we ranked excerpts based on their typicality index (formula defined in Sec. II F) and retained the 50 most typical from each category for a final database of 100 excerpts portraying terror and anxiety. Once we created FEARMUS, we then used descriptive and acoustic analyses to further describe how music conveys these two subtypes of fear.
II. METHODS
A. Curation of the initial database of musical excerpts
We curated three terrifying and three anxious musical excerpts from the original soundtracks of thirty horror films (see Table II for the list of films) resulting in a total of 180 excerpts (30 films × 3 excerpts × 2 emotions). We opted to curate music from horror films that were contemporary (released in 2013 or later), highly rated by both critics [assessed using “Metascore” on Metacritic (2023)] and viewers [assessed via “User Ratings” on IMDb (2023) (Internet Movie Database)], and had scores that contained 20 min or more of originally composed music. To curate, one of the experimenters with expertise in topic theory and horror film soundtracks listened to each soundtrack in full and used the descriptive criteria for ombra (McClelland, 2012, 2014) to select anxious excerpts and the descriptive criteria for tempesta (McClelland, 2014, 2017b) to select terrifying excerpts.5 The excerpts were between 10 and 30 s in length, depending on the natural phrasing of the excerpt, similarly to Eerola and Vuoskoski (2011). For the experiment, we decided to also include music that conveyed positive emotions to balance the experience of the participants. Therefore, we included 30 excerpts that convey happiness and 30 excerpts that convey tenderness from Eerola and Vuoskoski's (2011) previously validated database of emotional film music excerpts.
These are the film soundtracks that we curated from to create FEARMUS. We selected films that were (i) successful among viewers and critics alike as indicated via ratings from IMDb (2023) and Metacritic (2023), (ii) had originally composed scores that included at least 20 min of music, and (iii) were released in 2013 or later. The IMDb “User Ratings” and Metacritic “Metascores” reported here were gathered on 11 November 2022.
. | Title . | Year . | Country(ies) . | Director(s) . | Composer(s) . | User Ratings . | Metascore . |
---|---|---|---|---|---|---|---|
(out of 10) . | (out of 100) . | ||||||
1 | Midsommar | 2019 | USA and Sweden | Ari Aster | Bobby Krlic | 7.1 | 72 |
2 | Us | 2019 | USA | Jordan Peele | Michael Abels | 6.8 | 81 |
3 | A Quiet Place | 2018 | USA | John Krasinski | Marco Beltrami | 7.5 | 82 |
4 | Annihilation | 2018 | UK and USA | Alex Garland | Ben Salisbury and Geoff Barrow | 6.8 | 79 |
5 | Hereditary | 2018 | USA | Ari Aster | Colin Stetson | 7.3 | 87 |
6 | Mandy | 2018 | USA and Canada | Panos Cosmatos | Jóhann Jóhannsson | 6.5 | 81 |
7 | 1922 | 2017 | USA | Zak Hilditch | Mike Patton | 6.2 | 70 |
8 | Annabelle: Creation | 2017 | USA | David F. Sandberg | Benjamin Wallfisch | 6.5 | 62 |
9 | Get Out | 2017 | USA | Jordan Peele | Michael Abels | 7.7 | 85 |
10 | Ghost Stories | 2017 | UK | Jeremy Dyson and Andy Nyman | Frank Ilfman | 6.4 | 68 |
11 | It | 2017 | USA | Andy Muschietti | Benjamin Wallfisch | 7.3 | 69 |
12 | It Comes At Night | 2017 | USA | Trey Edward Shults | Brian McOmber | 6.2 | 78 |
13 | Revenge | 2017 | France | Coralie Fargeat | ROB (Robin Coudert) | 6.4 | 81 |
14 | The Blackcoat's Daughter | 2017 | USA and Canada | Osgood Perkins | Elvis Perkins | 5.9 | 68 |
15 | Tigers Are Not Afraida | 2017 | Mexico | Issa López | Vince Pope | 6.9 | 76 |
16 | 10 Cloverfield Lane | 2016 | USA | Dan Trachtenberg | Bear McCreary | 7.2 | 76 |
17 | Before I Wake | 2016 | USA | Mike Flanagan | Danny Elfman and The Newton Brothers | 6.2 | 68 |
18 | Don't Breathe | 2016 | USA | Fede Álvarez | Roque Baños | 7.1 | 71 |
19 | Hush | 2016 | USA | Mike Flanagan | The Newton Brothers | 6.6 | 67 |
20 | Rawb | 2016 | France and Belgium | Julia Ducournau | Jim Williams | 7.0 | 81 |
21 | Split | 2016 | USA | M. Night Shyamalan | West Dylan Thordson | 7.3 | 62 |
22 | Crimson Peak | 2015 | USA | Guillermo del Toro | Fernando Velázquez | 6.5 | 66 |
23 | The Devil's Candy | 2015 | USA | Sean Byrne | Michael Yezerski | 6.4 | 72 |
24 | The Invitation | 2015 | USA | Karyn Kusama | Theodore Shapiro | 6.6 | 74 |
25 | The Witch | 2015 | USA | Robert Eggers | Mark Korven | 6.9 | 83 |
26 | Goodnight Mommyc | 2014 | Austria | Veronika Franz and Severin Fiala | Olga Neuwirth | 6.7 | 81 |
27 | It Follows | 2014 | USA | David Robert Mitchell | Disasterpeace (Richard Vreeland) | 6.8 | 83 |
28 | The Babadook | 2014 | Australia | Jennifer Kent | Jed Kurzel | 6.8 | 86 |
29 | The Conjuring | 2013 | USA | James Wan | Joseph Bishara | 7.5 | 68 |
30 | You're Next | 2013 | USA | Adam Wingard | Jasper Justice Lee, Kyle McKinnon, Mads Heldtberg, and Adam Wingard | 6.6 | 66 |
Mean: | 6.8 | 75 | |||||
SD: | 0.4 | 7 |
. | Title . | Year . | Country(ies) . | Director(s) . | Composer(s) . | User Ratings . | Metascore . |
---|---|---|---|---|---|---|---|
(out of 10) . | (out of 100) . | ||||||
1 | Midsommar | 2019 | USA and Sweden | Ari Aster | Bobby Krlic | 7.1 | 72 |
2 | Us | 2019 | USA | Jordan Peele | Michael Abels | 6.8 | 81 |
3 | A Quiet Place | 2018 | USA | John Krasinski | Marco Beltrami | 7.5 | 82 |
4 | Annihilation | 2018 | UK and USA | Alex Garland | Ben Salisbury and Geoff Barrow | 6.8 | 79 |
5 | Hereditary | 2018 | USA | Ari Aster | Colin Stetson | 7.3 | 87 |
6 | Mandy | 2018 | USA and Canada | Panos Cosmatos | Jóhann Jóhannsson | 6.5 | 81 |
7 | 1922 | 2017 | USA | Zak Hilditch | Mike Patton | 6.2 | 70 |
8 | Annabelle: Creation | 2017 | USA | David F. Sandberg | Benjamin Wallfisch | 6.5 | 62 |
9 | Get Out | 2017 | USA | Jordan Peele | Michael Abels | 7.7 | 85 |
10 | Ghost Stories | 2017 | UK | Jeremy Dyson and Andy Nyman | Frank Ilfman | 6.4 | 68 |
11 | It | 2017 | USA | Andy Muschietti | Benjamin Wallfisch | 7.3 | 69 |
12 | It Comes At Night | 2017 | USA | Trey Edward Shults | Brian McOmber | 6.2 | 78 |
13 | Revenge | 2017 | France | Coralie Fargeat | ROB (Robin Coudert) | 6.4 | 81 |
14 | The Blackcoat's Daughter | 2017 | USA and Canada | Osgood Perkins | Elvis Perkins | 5.9 | 68 |
15 | Tigers Are Not Afraida | 2017 | Mexico | Issa López | Vince Pope | 6.9 | 76 |
16 | 10 Cloverfield Lane | 2016 | USA | Dan Trachtenberg | Bear McCreary | 7.2 | 76 |
17 | Before I Wake | 2016 | USA | Mike Flanagan | Danny Elfman and The Newton Brothers | 6.2 | 68 |
18 | Don't Breathe | 2016 | USA | Fede Álvarez | Roque Baños | 7.1 | 71 |
19 | Hush | 2016 | USA | Mike Flanagan | The Newton Brothers | 6.6 | 67 |
20 | Rawb | 2016 | France and Belgium | Julia Ducournau | Jim Williams | 7.0 | 81 |
21 | Split | 2016 | USA | M. Night Shyamalan | West Dylan Thordson | 7.3 | 62 |
22 | Crimson Peak | 2015 | USA | Guillermo del Toro | Fernando Velázquez | 6.5 | 66 |
23 | The Devil's Candy | 2015 | USA | Sean Byrne | Michael Yezerski | 6.4 | 72 |
24 | The Invitation | 2015 | USA | Karyn Kusama | Theodore Shapiro | 6.6 | 74 |
25 | The Witch | 2015 | USA | Robert Eggers | Mark Korven | 6.9 | 83 |
26 | Goodnight Mommyc | 2014 | Austria | Veronika Franz and Severin Fiala | Olga Neuwirth | 6.7 | 81 |
27 | It Follows | 2014 | USA | David Robert Mitchell | Disasterpeace (Richard Vreeland) | 6.8 | 83 |
28 | The Babadook | 2014 | Australia | Jennifer Kent | Jed Kurzel | 6.8 | 86 |
29 | The Conjuring | 2013 | USA | James Wan | Joseph Bishara | 7.5 | 68 |
30 | You're Next | 2013 | USA | Adam Wingard | Jasper Justice Lee, Kyle McKinnon, Mads Heldtberg, and Adam Wingard | 6.6 | 66 |
Mean: | 6.8 | 75 | |||||
SD: | 0.4 | 7 |
Vuelven [Spanish: (they) Return].
Grave [French: Severe].
Ich seh, Ich seh [German: I see, I see].
Each of the excerpts we curated were sampled at 48 000 Hz and normalized at 1 dB. A half-second fade-in and fade-out were added to each excerpt. All recordings are single-channel 16-bit wav-files. For the original 180 excerpts, the terrifying ones had a mean length of 16.73 s (SD = 5.70), and the anxious ones had a mean length of 17.88 s (SD = 4.18).
B. Validation experiment to filter the database
We collected emotion ratings to filter the database down to the 50 most typical anxious musical excerpts and the 50 most typical terrifying musical excerpts. To accomplish this, we had participants listen to a randomly selected portion (one third) of the original collection of musical excerpts.6 Specifically, each participant listened to 30 of the terror excerpts, 30 of the anxiety excerpts, and all of the 30 happiness and 30 tenderness excerpts. We used a custom-built code in matlab to randomly select one third of the anxiety and terror excerpts per participant. The code ensured that all of the excerpts were evenly randomly distributed across participants.7 During the experiment, the sound files were presented in a pseudorandomized order with no target emotions being presented consecutively. After listening to each excerpt, participants rated how well it portrayed terror, anxiety, happiness, and tenderness. Four analogue-categorical scales were used for these ratings, each of which had visual reference points akin to a seven-point Likert scale but operated continuously (1 = portraying the target emotion very poorly, 7 = portraying the target emotion very successfully). Additionally, they were asked how familiar they were with each excerpt (0 = unfamiliar, 1 = somewhat familiar, 2 = very familiar), what valence the excerpt conveyed (−3 = very negative, 3 = very positive) and what arousal level each excerpt conveyed (1 = very low, 7 = very high) with similar analogue-categorical scales. Before they began the experiment, participants were given a list of definitions8 for each of the emotion terms used during the task (i.e., terror, anxiety, happiness, tenderness, arousal, and valence). This study was approved by the Cantonal Ethics Commission of Zürich, Switzerland.
C. Participants in the validation experiment
We determined a target number for recruitment using a power analysis in r with a large effect size (0.7), a significance level of 0.05, and a power of 0.8 (return value was 33 participants per third, or 99 in total). Then we recruited 113 English-speaking non-musician participants from the University of Zürich (UZH) and from Zürich University of Applied Sciences. We elected to recruit non-musician participants to create a database that conveys the target emotions successfully to the general population rather than to only a musically trained subgroup. To qualify as a non-musician, participants were required to (i) not own any music certificates or diplomas and (ii) not have played a musical instrument daily within the last five years or during their childhood. We also used the Goldsmith's Musical Sophistication Index (GMSI) (Müllensiefen et al., 2014) to quantify the average level of musicianship of our participants.9 During data collection, the data from eight participants were lost due to a technical error. Therefore, our initial sample contained 105 participants (69 female), who were 18 to 49 years old (M = 25.77, SD = 5.75). All participants gave informed and written consent for their participation in accordance with the ethical and data security guidelines of the University of Zürich.
D. Procedure for the validation experiment
At the start of the experiment, participants had two practice trials to familiarize themselves with the interface and rating scales and to test the headphones. Participants listened to and rated 120 musical excerpts on the seven scales mentioned above: terror, anxiety, happiness, tenderness, arousal, valence, and familiarity. After the ratings experiment, they took an online questionnaire that consisted of several surveys,10 including the GMSI. Participants then chose between participation hours (course credit) or 40 Swiss francs (CHF) as compensation for their participation. The ratings task took 80–90 min, and the questionnaire took 20–30 min (total time: 100–120 min). Participants were invited to take a short break every 25% of the way through the ratings task (approximately every 20 min).
E. Data checks for the validation experiment
To check the data for outliers, we calculated a correlation coefficient for each participant's ratings compared to the mean ratings for each musical excerpt. Boxplots of the correlation coefficients showed that six participants were frequently outliers and therefore may not have understood the rating scales. Therefore, we eliminated their data from subsequent analyses. Our final sample contained 99 participants (66 female), who were 18–49 years old (M = 25.84, SD = 5.84). We also checked the familiarity rating of each musical excerpt and found that two excerpts were outliers with a mean rating of above 0.6. External associations that listeners have with familiar music can distort the emotions that they believe the music portrays (Eerola and Vuoskoski, 2011; Juslin and Västfjäll, 2008; Schellenberg et al., 2008; Vieillard et al., 2008). Therefore, we excluded these two excerpts from subsequent analyses for possibly being overly familiar to the participants.
F. Creating the final FEARMUS database
To obtain highly typical examples of music for terror and anxiety, we calculated a typicality index (T) of the target emotion for each excerpt using the same procedure and equations as Eerola and Vuoskoski (2011). Typicality was calculated by subtracting the mean of the excerpt's non target emotion rating (NE) and the standard deviation of its target emotion rating (SE) from the mean of the target emotion rating (E):
T = E – SE – NE (Eerola and Vuoskoski, 2011, p. 26).
Based on the typicality index, we ranked the 180 original musical excerpts (90 terror, 90 anxiety) to find the most typical excerpts for each corresponding emotion. We then used this ranking to select the 50 most typical anxious and 50 most typical terrifying excerpts resulting in a total of 100 excerpts (in other words, 55.56% of the originally curated collection of excerpts).
G. Descriptive analysis procedure
To uncover the sonic differences between the anxious and terrifying musical excerpts in FEARMUS, we conducted an exploratory descriptive analysis. The analysis consisted of two of the investigators listening to randomly sampled selections of the FEARMUS database and noting their observation for various musical features. For our analysis, we used the same musical features that McClelland (2014) (p. 282) used to describe ombra and tempesta (see the first column of Table VI for the full list). To do this analysis, we used Apple Music to randomly sample (using “shuffle” mode) 10 musical excerpts from the FEARMUS database per musical descriptor per target emotion (10 terror and then 10 anxiety, per each musical feature). Sitting together and listening to the excerpts on a speaker, the researchers individually noted their observations for each musical feature. For example, for the feature “tempo,” the researchers listened to ten randomly selected excerpts from FEARMUS that portrayed anxiety, and then ten that portrayed terror, simultaneously writing down their observations about “tempo” for each target emotion before moving on to another musical feature. Upon completing their notes for all of the features, the two investigators pooled and summarized their combined observations to produce the final results (shown in Table VI).
H. Acoustic analyses procedure
The results of our exploratory descriptive analyses produced several hypotheses about the sonic similarities and differences between musically conveyed terror and anxiety. We elected to test these hypotheses using confirmatory acoustic analyses. For these analyses, we chose to measure 13 acoustic features that were suitable to test our hypotheses (see Table VII in Sec. III). To complete these analyses, we used the Music Information Retrieval (MIR) toolbox (version 1.8.1) (Lartillot et al., 2007) in matlab.11 For acoustic features related to timbre, we controlled for the different lengths of the excerpts in FEARMUS by randomly sampling five one-second segments of music from each excerpt to analyze. These segments were non-overlapping and did not contain the first and last faded half-seconds of each track. We analyzed the other non-timbral acoustic features across the full lengths of the excerpts. We then controlled for length differences with random slopes in our mixed effects models (see Sec. II I). We measured all spectral features with a sampling rate of 48 000 Hz, a Hamming window, and a frame length of 50 ms with a half-overlapping hop length.
I. Data analysis
All statistical analyses were done using r analysis software version 3.6.1 (R Core Team, 2019). We used both general linear regression models (for data without repeated measures) and mixed-effects linear regression models (for data with repeated measures) to test our hypotheses regarding the emotion ratings and the acoustic analyses. For these analyses, we used the lm function to fit general linear models and the lmer function to fit mixed effect linear regression models. We also used the “lme4” library (Bates et al., 2015) to fit the mixed-effects models and calculate t-values, and the “lmer” test package (Kuznetsova et al., 2017) to estimate p-values and degrees of freedom. Before model fitting, all categorical variables were coded as 0 and 1 in alphabetical order (i.e., for target emotion, anxiety = 0 and terror = 1). The significance level for all analyses was set to FDR-adjusted p < 0.05.
III. RESULTS
A. Results of the validation experiment
To test our main hypotheses, we used emotion ratings (valence, arousal, terror, and anxiety) as the predicted values and target emotions (terror and anxiety) as predictor values for four mixed-effects linear regression models. Additionally, we used target emotions (terror and anxiety) as the predicted values and emotion ratings (terror and anxiety) as the predictor values for two additional mixed effects linear regression models. Participants and excerpt track numbers were included as random slopes. We report the results of the regression analyses in Table III, and the means and standard deviations of the ratings, grouped by target emotion, in Table IV.
This table shows the results of our linear regression analyses testing how the emotion ratings (anxiety, terror, valence, arousal) are predicted by the target emotions (anxiety, terror), and how the target emotions (anxiety, terror) are predicted by the discrete emotion ratings (anxiety, terror). The bold results are statistically significant at adjusted p < 0.01. The results demonstrate more negative valence ratings, higher arousal ratings, and higher terror ratings for terrifying musical stimuli as compared to anxious musical stimuli, consistent with our hypotheses. Furthermore, anxious musical stimuli were rated as conveying a higher level of anxiety than terror, consistent with our hypotheses. However, terrifying musical stimuli were rated as conveying a higher level of anxiety than terror, and a greater degree of anxiety than anxious musical excerpts, inconsistent with our hypotheses. n = 99.
. | Est. . | SE . | t . | Unadj. p . | Adj. p . | df . |
---|---|---|---|---|---|---|
Anxiety rating | ||||||
Intercept Target emotion (1 = terror) | 5.136 | 0.085 | 60.164 | < 2E−16 | < 2.6E−16 | 137.274 |
0.378 | 0.072 | 5.222 | 4.69E−07 | 5.12E−07 | 186.602 | |
Terror rating | ||||||
Intercept Target emotion (1 = terror) | 4.260 | 0.127 | 33.600 | < 2E−16 | < 2.6E−16 | 138.223 |
1.014 | 0.104 | 9.800 | < 2E−16 | < 2.6E−16 | 217.542 | |
Valence rating | ||||||
Intercept Target emotion (1 = terror) | −1.418 | 0.062 | −22.800 | < 2E−16 | < 2.6E−16 | 148.998 |
−0.481 | 0.050 | −9.720 | < 2E−16 | < 2.6E−16 | 187.872 | |
Arousal rating | ||||||
Intercept Target emotion (1 = terror) | 4.230 | 0.091 | 46.850 | < 2E−16 | < 2.6E−16 | 133.467 |
0.890 | 0.081 | 11.000 | < 2E−16 | < 2.6E−16 | 204.545 | |
Conveyed anxiety | ||||||
Intercept Rating scale (1 = terror rating) | 5.136 | 0.085 | 60.421 | < 2E−16 | < 2.6E−16 | 133.188 |
−0.875 | 0.096 | −9.141 | 4.52E−15 | 5.42E−15 | 106.842 | |
Conveyed terror | ||||||
Intercept Rating scale (1 = terror rating) | 5.513 | 0.071 | 78.191 | < 2E−16 | < 2.6E−16 | 121.292 |
−0.239 | 0.074 | −3.221 | 1.64E−03 | 1.64E−03 | 121.538 |
. | Est. . | SE . | t . | Unadj. p . | Adj. p . | df . |
---|---|---|---|---|---|---|
Anxiety rating | ||||||
Intercept Target emotion (1 = terror) | 5.136 | 0.085 | 60.164 | < 2E−16 | < 2.6E−16 | 137.274 |
0.378 | 0.072 | 5.222 | 4.69E−07 | 5.12E−07 | 186.602 | |
Terror rating | ||||||
Intercept Target emotion (1 = terror) | 4.260 | 0.127 | 33.600 | < 2E−16 | < 2.6E−16 | 138.223 |
1.014 | 0.104 | 9.800 | < 2E−16 | < 2.6E−16 | 217.542 | |
Valence rating | ||||||
Intercept Target emotion (1 = terror) | −1.418 | 0.062 | −22.800 | < 2E−16 | < 2.6E−16 | 148.998 |
−0.481 | 0.050 | −9.720 | < 2E−16 | < 2.6E−16 | 187.872 | |
Arousal rating | ||||||
Intercept Target emotion (1 = terror) | 4.230 | 0.091 | 46.850 | < 2E−16 | < 2.6E−16 | 133.467 |
0.890 | 0.081 | 11.000 | < 2E−16 | < 2.6E−16 | 204.545 | |
Conveyed anxiety | ||||||
Intercept Rating scale (1 = terror rating) | 5.136 | 0.085 | 60.421 | < 2E−16 | < 2.6E−16 | 133.188 |
−0.875 | 0.096 | −9.141 | 4.52E−15 | 5.42E−15 | 106.842 | |
Conveyed terror | ||||||
Intercept Rating scale (1 = terror rating) | 5.513 | 0.071 | 78.191 | < 2E−16 | < 2.6E−16 | 121.292 |
−0.239 | 0.074 | −3.221 | 1.64E−03 | 1.64E−03 | 121.538 |
This table shows the means and SDs of the seven emotion ratings (anxiety, terror, tenderness, happiness, valence, arousal, familiarity) that participants gave the musical excerpts as grouped by the four target emotions (anxiety, terror, tenderness, happiness). Analog-categorical scales were used for these ratings, each of which had visual reference points akin to a 7-point Likert scale but operated continuously. The four discrete emotions were rated from 1 to 7 (1 = portraying the target emotion very poorly, 7 = portraying the target emotion very successfully), valence was rated –3 to 3 ( –3 = very negative, 3 = very positive), arousal 1 to 7 (1 = very low, 7 = very high), and familiarity 0 to 2 (0 = unfamiliar, 1 = somewhat familiar, 2 = very familiar). n = 99.
. | . | Rating . | ||||||
---|---|---|---|---|---|---|---|---|
. | anxiety . | terror . | tenderness . | happiness . | valence . | arousal . | familiarity . | |
5.14 | 4.26 | 1.54 | 1.29 | −1.42 | 4.26 | 0.32 | ||
Target emotions | anxiety | (1.36) | (1.76) | (1.16) | (0.71) | (0.98) | (1.46) | (0.5) |
[1–7] | [1–7] | [1–7] | [1–7] | [−3 to 3] | [1–7] | [0–2] | ||
5.51 | 5.28 | 1.26 | 1.16 | −1.9 | 5.15 | 0.38 | ||
terror | (1.24) | (1.60) | (0.90) | (0.5) | (0.95) | (1.39) | (0.55) | |
[1–7] | [1–7] | [1–7] | [1–7] | [−3 to 3] | [1–7] | [0–2] | ||
1.43 | 1.11 | 5.64 | 4.43 | 1.2 | 4.32 | 0.59 | ||
tenderness | (0.91) | (0.4) | (1.28) | (1.51) | (1.09) | (1.37) | (0.63) | |
[1–7] | [1–7] | [1–7] | [1–7] | [−3 to 3] | [1–7] | [0–2] | ||
1.36 | 1.11 | 4.61 | 5.4 | 1.67 | 4.59 | 0.67 | ||
happiness | (0.81) | (0.41) | (1.67) | (1.36) | (0.99) | (1.39) | (0.68) | |
[1–7] | [1–7] | [1–7] | [1–7] | [−3 to 3] | [1–7] | [0–2] |
. | . | Rating . | ||||||
---|---|---|---|---|---|---|---|---|
. | anxiety . | terror . | tenderness . | happiness . | valence . | arousal . | familiarity . | |
5.14 | 4.26 | 1.54 | 1.29 | −1.42 | 4.26 | 0.32 | ||
Target emotions | anxiety | (1.36) | (1.76) | (1.16) | (0.71) | (0.98) | (1.46) | (0.5) |
[1–7] | [1–7] | [1–7] | [1–7] | [−3 to 3] | [1–7] | [0–2] | ||
5.51 | 5.28 | 1.26 | 1.16 | −1.9 | 5.15 | 0.38 | ||
terror | (1.24) | (1.60) | (0.90) | (0.5) | (0.95) | (1.39) | (0.55) | |
[1–7] | [1–7] | [1–7] | [1–7] | [−3 to 3] | [1–7] | [0–2] | ||
1.43 | 1.11 | 5.64 | 4.43 | 1.2 | 4.32 | 0.59 | ||
tenderness | (0.91) | (0.4) | (1.28) | (1.51) | (1.09) | (1.37) | (0.63) | |
[1–7] | [1–7] | [1–7] | [1–7] | [−3 to 3] | [1–7] | [0–2] | ||
1.36 | 1.11 | 4.61 | 5.4 | 1.67 | 4.59 | 0.67 | ||
happiness | (0.81) | (0.41) | (1.67) | (1.36) | (0.99) | (1.39) | (0.68) | |
[1–7] | [1–7] | [1–7] | [1–7] | [−3 to 3] | [1–7] | [0–2] |
Consistent with our hypotheses, the results demonstrate a significant main effect between target emotions (1 = terror) and terror ratings driven by higher terror ratings for terrifying musical stimuli (M = 5.28, SD = 1.58) as compared to anxious musical stimuli (M = 4.26, SD = 1.76, p < 0.0001). Also, consistent with our hypotheses, the results demonstrate a significant main effect between target emotions and valence ratings driven by more negative valence ratings for terrifying musical stimuli (M = –1.90, SD = 0.95) as compared to anxious musical stimuli (M = –1.42, SD = 0.98, p < 0.0001), and a significant main effect between target emotions and arousal ratings driven by higher arousal ratings for terrifying musical stimuli (M = 5.15, SD = 1.39) as compared to anxious musical stimuli (M = 4.26, SD = 1.46, p < 0.0001). Furthermore, the results demonstrate a significant main effect between emotion ratings (terror = 1) and musically conveyed anxiety driven by higher anxiety ratings (M = 5.14, SD = 1.36) than terror ratings (M = 4.26, SD = 1.76, p < 0.0001) for anxious music, also consistent with our hypotheses. However, inconsistent with our hypotheses, the results demonstrate a significant main effect between target emotions and anxiety ratings driven by higher anxiety ratings for terrifying musical stimuli (M = 5.51, SD = 1.24) as compared to anxious musical stimuli (M = 5.14, SD = 1.36, p < 0.0001). Additionally, the results demonstrate a significant main effect between emotion ratings (terror rating = 1) and musically conveyed terror driven by higher anxiety ratings (M = 5.51, SD = 1.24) than terror ratings (M = 5.28, SD = 1.60, p < 0.0001) for terrifying music, inconsistent with our hypotheses. The rating results are also displayed as boxplots in Figs. 1 and 2.
The boxplots show the results of the ratings experiment during which 99 participants rated 240 musical excerpts selected to convey one of four target emotions (90 conveying terror, 90 conveying anxiety, 30 conveying happiness, and 30 conveying tenderness). Participants rated the conveyed emotion of the musical excerpts using discrete emotion scales and dimensional emotion scales (valence and arousal). Here, we show the average discrete emotional ratings per target emotion. In line with our predictions, participants rated the terrifying musical excerpts as conveying greater terror than the anxious musical excerpts. Furthermore, they also rated the anxious musical excerpts as conveying greater anxiety than terror, consistent with our hypotheses. However, inconsistent with our hypothesis, participants rated the terrifying excerpts as conveying greater anxiety than terror, and a greater degree of anxiety than the anxious excerpts.
The boxplots show the results of the ratings experiment during which 99 participants rated 240 musical excerpts selected to convey one of four target emotions (90 conveying terror, 90 conveying anxiety, 30 conveying happiness, and 30 conveying tenderness). Participants rated the conveyed emotion of the musical excerpts using discrete emotion scales and dimensional emotion scales (valence and arousal). Here, we show the average discrete emotional ratings per target emotion. In line with our predictions, participants rated the terrifying musical excerpts as conveying greater terror than the anxious musical excerpts. Furthermore, they also rated the anxious musical excerpts as conveying greater anxiety than terror, consistent with our hypotheses. However, inconsistent with our hypothesis, participants rated the terrifying excerpts as conveying greater anxiety than terror, and a greater degree of anxiety than the anxious excerpts.
The boxplots show additional results from the ratings experiment described in the Fig. 1 caption. Specifically, these boxplots show the average dimensional emotion ratings per target emotion of the musical excerpts. The top graph shows the valence ratings and the bottom graph shows the arousal ratings. Consistent with our hypotheses, participants rated the terrifying musical excerpts as conveying a more negative average valence and a higher average arousal than the anxious musical excerpts.
The boxplots show additional results from the ratings experiment described in the Fig. 1 caption. Specifically, these boxplots show the average dimensional emotion ratings per target emotion of the musical excerpts. The top graph shows the valence ratings and the bottom graph shows the arousal ratings. Consistent with our hypotheses, participants rated the terrifying musical excerpts as conveying a more negative average valence and a higher average arousal than the anxious musical excerpts.
B. Results of creating the final FEARMUS database using the typicality index
To create the FEARMUS database, we calculated the typicality index (T) for each of the 180 excerpts and used it to rank them to find the most typical excerpts for each target emotion. The final database consists of the 50 most typical terrifying musical excerpts and the 50 most typical anxious musical excerpts for a total of 100 excerpts (see supplementary Table S1 for information about each excerpt12). Table V provides information on the distribution of the typicality indices per each emotion.
Descriptive statistics characterizing the resulting typicality indices per target emotion in FEARMUS. n = 100.
. | . | Typicality Index . | |||
---|---|---|---|---|---|
. | Mean . | SD . | Min . | Max . | |
Target emotions | Anxiety | 1.85 | 0.31 | 1.44 | 2.57 |
Terror | 1.65 | 0.41 | 1.02 | 2.77 |
. | . | Typicality Index . | |||
---|---|---|---|---|---|
. | Mean . | SD . | Min . | Max . | |
Target emotions | Anxiety | 1.85 | 0.31 | 1.44 | 2.57 |
Terror | 1.65 | 0.41 | 1.02 | 2.77 |
The final selection of excerpts have very similar length distributions. Terrifying excerpts have a mean length of 18.2 s (SD = 6.02) and anxious excerpts have a mean length of 17.8 s (SD = 4.10). FEARMUS is available for download from the Open Science Framework (2023).
C. Results of the descriptive analysis
We used McClelland's (2014) list of musical features that he used to describe ombra and tempesta as a model for our descriptive analysis of the FEARMUS database. Table VI contains the results. In the second column, we define the musical terms that we used with the Merriam-Webster online dictionary (Merriam-Webster, 2023). Several crucial differences between anxious and terrifying music are apparent in these results concerning tempo, harmony, melody and figuration, dynamics, timbre, and the sounds referenced by these types of music. Anxious music contains slow, heavy tempi while terrifying music contains fast tempi or sustained walls of noise. The harmonies in anxious music are vast and spacious while the harmonies in terrifying music are often densely packed. The melodies in anxious music contain stuttering, sighing figurations while terrifying music contains more chaotic figurations including shrill, randomly stepping, fast-moving lines or noisy clusters. Anxious music typically features open, hollow textures than lean on both extremes of the pitch spectrum (very low and very high) while terrifying music usually has a very active, dense, massive texture. Concerning dynamics, anxious music is typically quieter overall compared to the screaming intensity of terrifying music. The timbre of anxious music is more muted, distant, and dark while the timbre of terrifying music is harsh, noisy, and jarring. The sounds referenced by anxious music are suspenseful or generally creepy (e.g., rats skittering and squeaking, ticking clocks, creaking doors, whispering) while the sounds referenced by terrifying music are evocative of more urgent threats or reactions to such threats (e.g., fire alarms, explosions, thunder, banging on a door). There are also many shared features between anxious and terrifying music, especially concerning tonality, rhythm, bass, and instrumentation. Both are typically in a minor key, if a key is in fact discernable amidst the large amount of chromaticism present in both. They both often contain unpredictable dynamics, pulsing or sustained basses (although at different tempi), and hugely unpredictable rhythms (e.g., sudden entrances, fluctuations in tempo, random silences). Since these types of music are often featured side by side in the same score, it is perhaps unsurprising that they share similar instrumentations as well: classic full string orchestra with some electronic sounds and sometimes added voices, piano, or percussion.
This table shows the results of the descriptive analysis of the FEARMUS database. The anxious and terrifying excerpts in the FEARMUS database are compared along features used by McClelland (2014) to describe the ombra and tempesta topics.
Feature . | Definition . | Anxious music . | Terrifying music . |
---|---|---|---|
General features | summary observations about the overall sound of these musical excerpts | impending doom communicated by insistent rhythms and held tones, suspenseful, atmospheric, uneasy | frantic, chaotic, noisy, distressing, inescapable, adrenaline |
Tempo | the rate of speed of a musical piece or passage indicated by one of a series of directions (such as largo, presto, or allegro) and often by an exact metronome markinga | ponderous, pacing, heavy march, funereal, slowly meandering or held chords or tones without a clear tempo, stasis | frenetic, throbbing, far-apart beats barely held in succession, frantic, spasmodic, walls of noise |
Tonality | the organization of all the tones and harmonies of a piece of music in relation to a tonica | minor, chromatic, lacking a discernable tonic or tonality | monophonic drone tones with ambiguous tonalities, highly chromatic, percussive without tonality, minor |
Harmony | the structure of music with respect to the composition and progression of chordsa | wide ranging, hollow, clustered in the bass and soprano, minor chords, wandering, unmoored, chromatic, repetitive short progressions | dense, compact, unchanging wall of sound; slowly rising chromatically, building held chord that becomes increasingly dissonant |
Melody | a rhythmic succession of single tones organized as an aesthetic wholea | high-pitched, drifting tones; sliding fragments, rising overall directionality, falling figures, held or repeated tones, minor mode, narrow pitch range | shrill, noisy clusters, randomly stepping lines, drunken, lost, repetitive fragments, held trills, lack of a distinct melody, rising figures |
Bass | of low pitch; relating to or having the range or part of a bassa | percussive, unpredictable, absent from the texture, slow beats, sustained, narrow pitch range, voluminous drones | throbbing, rumbling presence, lack of bass entirely, insistent pulse, distant drones, shifting voices, swirling, dense chords |
Figuration | ornamentation of a musical passage by using decorative and usually repetitive figuresa | sighing, stuttering fragments; space between utterances, drifting held tones, held chord walls, blips of noise, slow trills, throbbing drones, narrow motions | short bursts of noise, trills, tremolo, held clusters of tones; chattering, sliding, rising figures |
Rhythm | the aspect of music comprising all the elements (such as accent, meter, and tempo) that relate to forward movementa | stasis punctuated by unpredictable entrances, consistent underlying pulse, fluctuations in the tempo of fragments, uneven silences, shifting repetitive figures, slower motions | sustained chaos, fast-running pulse, unpredictable shifts in the overall rhythmic texture, slow chord changes, fast-running upper lines |
Texture | a pattern of musical sound created by tones or lines played or sung togethera | hollow polyphony, wide range between parts, substantial leaning in bass and soprano | dissonant polyphony, dense walls of sound, voices slowly added to build intensity, full and active texture, growing, massive |
Dynamics | variation and contrast in force or intensitya | static, quieter, slowly increasing in volume, sudden swells | screaming, loud, startling entrances, very sudden shifts from total silence to extreme loudness, sharp accents, increasing loudness |
Instrumentation | the arrangement or composition of music for instruments especially for a band or orchestraa | classical string orchestra, percussion, electronic sounds (more tonal than noisy), voices, piano | classical string orchestra, electronic sounds (more noisy than tonal), percussion, choirs, brass |
Sound references | any non-musical sounds evoked or mimicked by the music | record playing, car running, alarms, clock ticking, rats, door creaking, rattle of a snake, wind chimes, large creatures bellowing, footsteps, machinery, children's voices, shouting, whispering, whistling | screams, fire alarms, banging on a door, tea kettle whistles, earthquake, thunder, bees buzzing, weather siren, bats, birds, elephant bellow, gunshots, explosions, car crash, monkey shrieks |
Timbre | the quality given to a sound by its overtonesa | buzzy, grating, ethereal, other-worldly, raspy, slithering, plucked, dark, muted, distant, reverberant, muddy, discordant, moving in space from far away to close to the ear | shrieking, shrill, painful, noisy, harsh, close to the ear, blurred, jarring, active, unpleasant |
Feature . | Definition . | Anxious music . | Terrifying music . |
---|---|---|---|
General features | summary observations about the overall sound of these musical excerpts | impending doom communicated by insistent rhythms and held tones, suspenseful, atmospheric, uneasy | frantic, chaotic, noisy, distressing, inescapable, adrenaline |
Tempo | the rate of speed of a musical piece or passage indicated by one of a series of directions (such as largo, presto, or allegro) and often by an exact metronome markinga | ponderous, pacing, heavy march, funereal, slowly meandering or held chords or tones without a clear tempo, stasis | frenetic, throbbing, far-apart beats barely held in succession, frantic, spasmodic, walls of noise |
Tonality | the organization of all the tones and harmonies of a piece of music in relation to a tonica | minor, chromatic, lacking a discernable tonic or tonality | monophonic drone tones with ambiguous tonalities, highly chromatic, percussive without tonality, minor |
Harmony | the structure of music with respect to the composition and progression of chordsa | wide ranging, hollow, clustered in the bass and soprano, minor chords, wandering, unmoored, chromatic, repetitive short progressions | dense, compact, unchanging wall of sound; slowly rising chromatically, building held chord that becomes increasingly dissonant |
Melody | a rhythmic succession of single tones organized as an aesthetic wholea | high-pitched, drifting tones; sliding fragments, rising overall directionality, falling figures, held or repeated tones, minor mode, narrow pitch range | shrill, noisy clusters, randomly stepping lines, drunken, lost, repetitive fragments, held trills, lack of a distinct melody, rising figures |
Bass | of low pitch; relating to or having the range or part of a bassa | percussive, unpredictable, absent from the texture, slow beats, sustained, narrow pitch range, voluminous drones | throbbing, rumbling presence, lack of bass entirely, insistent pulse, distant drones, shifting voices, swirling, dense chords |
Figuration | ornamentation of a musical passage by using decorative and usually repetitive figuresa | sighing, stuttering fragments; space between utterances, drifting held tones, held chord walls, blips of noise, slow trills, throbbing drones, narrow motions | short bursts of noise, trills, tremolo, held clusters of tones; chattering, sliding, rising figures |
Rhythm | the aspect of music comprising all the elements (such as accent, meter, and tempo) that relate to forward movementa | stasis punctuated by unpredictable entrances, consistent underlying pulse, fluctuations in the tempo of fragments, uneven silences, shifting repetitive figures, slower motions | sustained chaos, fast-running pulse, unpredictable shifts in the overall rhythmic texture, slow chord changes, fast-running upper lines |
Texture | a pattern of musical sound created by tones or lines played or sung togethera | hollow polyphony, wide range between parts, substantial leaning in bass and soprano | dissonant polyphony, dense walls of sound, voices slowly added to build intensity, full and active texture, growing, massive |
Dynamics | variation and contrast in force or intensitya | static, quieter, slowly increasing in volume, sudden swells | screaming, loud, startling entrances, very sudden shifts from total silence to extreme loudness, sharp accents, increasing loudness |
Instrumentation | the arrangement or composition of music for instruments especially for a band or orchestraa | classical string orchestra, percussion, electronic sounds (more tonal than noisy), voices, piano | classical string orchestra, electronic sounds (more noisy than tonal), percussion, choirs, brass |
Sound references | any non-musical sounds evoked or mimicked by the music | record playing, car running, alarms, clock ticking, rats, door creaking, rattle of a snake, wind chimes, large creatures bellowing, footsteps, machinery, children's voices, shouting, whispering, whistling | screams, fire alarms, banging on a door, tea kettle whistles, earthquake, thunder, bees buzzing, weather siren, bats, birds, elephant bellow, gunshots, explosions, car crash, monkey shrieks |
Timbre | the quality given to a sound by its overtonesa | buzzy, grating, ethereal, other-worldly, raspy, slithering, plucked, dark, muted, distant, reverberant, muddy, discordant, moving in space from far away to close to the ear | shrieking, shrill, painful, noisy, harsh, close to the ear, blurred, jarring, active, unpleasant |
Dictionary definitions by Merriam-Webster (2023).
D. Results of the acoustic analyses
We selected thirteen acoustic features for a confirmatory analysis of our observations following the descriptive analysis. The full list of features, including definitions and our hypotheses, is in Table VII. With these features, we were able to gather data about the tempo and rhythm (pulse clarity, event density), loudness (RMS, low energy), mode, and timbre (brightness, roughness, noisiness, and spectral distribution descriptors) of the excerpts. Given our descriptive results, we predicted that compared to anxious music, terrifying music would be significantly faster, louder, and rhythmically denser, would have more consistent loudness levels, and would exhibit timbres that are generally brighter, rougher, and noisier. We also predicted that both terrifying and anxious music would be in the minor mode and have similarly irregular rhythmic structures (low pulse clarity). We measured the non-timbral acoustic features (e.g., event density, loudness, loudness variability, mode, pulse clarity, and tempo) across the entire length of each excerpt. To measure the timbral features (e.g., brightness, roughness, zero crossing rate, and all spectral distribution descriptors), we analyzed five randomly selected one-second-long segments taken from each musical excerpt, and then averaged those values together resulting in 100 mean values (one per excerpt). Table VIII presents the resulting unnormalized mean values.
This table shows the thirteen acoustic features we selected to investigate to compare the anxious and terrifying music excerpts in FEARMUS. In the left-most columns, we list the acoustic features of interest along with their definitions, and also interpretations of those definitions (see footnotes for citations). In the right-most columns, we indicate how we predicted these features would behave in anxious compared to terrifying music.
Feature . | Definition . | Interpretation . | Anxiety . | Terror . |
---|---|---|---|---|
Event density | The average frequency of events, i.e., the number of events detected per seconda | Density of musical activity | ↓ | ↑ |
Root-mean square (RMS) | The global energy of the signal computed by taking the root average of the square of the amplitudea | Loudness | ↓ | ↑ |
Low energy | The percentage of frames showing less-than-average energya,b | Loudness variability | ↑ | ↓ |
Mode | An arrangement of the eight diatonic notes or tones of an octave according to one of several fixed schemes of their intervalsc | Major (closer to +1) or minor (closer to −1) | equally minor (negative) | |
Pulse clarity | Estimates the rhythmic clarity, indicating the strength of the beatsa,d | Strength of a perceived rhythm or beat | equally low | |
Tempo | The rate of speed of a musical piece or passage indicated by one of a series of directions (such as largo, presto, or allegro) and often by an exact metronome markingc | The perceived pace or speed of the music | ↓ | ↑ |
Brightness | Measuring the amount of energy above a cut- off frequency (1500 Hz)a,e | Way of estimating the high frequency energy content of a spectral distribution | ↓ | ↑ |
Roughness | The average of all the dissonance between all possible pairs of peaks of a spectruma,f | Estimation of the sensory dissonancea | ↓ | ↑ |
Spectral centroid | Geometric center of the amplitude spectruma | Spectral distribution descriptorg | ↓ | ↑ |
Spectral flatness | Ratio between the geometric and the arithmetic mean of the spectruma | Discriminates noise from harmonic contentg | ↓ | ↑ |
Spectral roll-off | The frequency below which 85%d of the total spectral energy is containeda | Estimation of the amount of high frequency energy contentg | ↓ | ↑ |
Spectral spread | The standard deviation of the spectral distributiona | Spectral distribution descriptorg | ↓ | ↑ |
Zero crossing rate | A simple indicator of noisiness: counting the number of times the signal crosses the x-axisa | A simple indicator of noisinessh | ↓ | ↑ |
Feature . | Definition . | Interpretation . | Anxiety . | Terror . |
---|---|---|---|---|
Event density | The average frequency of events, i.e., the number of events detected per seconda | Density of musical activity | ↓ | ↑ |
Root-mean square (RMS) | The global energy of the signal computed by taking the root average of the square of the amplitudea | Loudness | ↓ | ↑ |
Low energy | The percentage of frames showing less-than-average energya,b | Loudness variability | ↑ | ↓ |
Mode | An arrangement of the eight diatonic notes or tones of an octave according to one of several fixed schemes of their intervalsc | Major (closer to +1) or minor (closer to −1) | equally minor (negative) | |
Pulse clarity | Estimates the rhythmic clarity, indicating the strength of the beatsa,d | Strength of a perceived rhythm or beat | equally low | |
Tempo | The rate of speed of a musical piece or passage indicated by one of a series of directions (such as largo, presto, or allegro) and often by an exact metronome markingc | The perceived pace or speed of the music | ↓ | ↑ |
Brightness | Measuring the amount of energy above a cut- off frequency (1500 Hz)a,e | Way of estimating the high frequency energy content of a spectral distribution | ↓ | ↑ |
Roughness | The average of all the dissonance between all possible pairs of peaks of a spectruma,f | Estimation of the sensory dissonancea | ↓ | ↑ |
Spectral centroid | Geometric center of the amplitude spectruma | Spectral distribution descriptorg | ↓ | ↑ |
Spectral flatness | Ratio between the geometric and the arithmetic mean of the spectruma | Discriminates noise from harmonic contentg | ↓ | ↑ |
Spectral roll-off | The frequency below which 85%d of the total spectral energy is containeda | Estimation of the amount of high frequency energy contentg | ↓ | ↑ |
Spectral spread | The standard deviation of the spectral distributiona | Spectral distribution descriptorg | ↓ | ↑ |
Zero crossing rate | A simple indicator of noisiness: counting the number of times the signal crosses the x-axisa | A simple indicator of noisinessh | ↓ | ↑ |
This table shows the unnormalized means and standard deviations of the 13 acoustic features (grouped by expressed emotion: anxiety or terror) that we measured to analyze the FEARMUS database. The upper half of the table reports the acoustic features we measured across entire excerpts, and the lower half reports the acoustic features we measured and averaged across five 1-s-long fragments randomly sampled per excerpt. The right column provides context for interpreting the resulting values for each acoustic feature [see Lartillot (2021) for more detailed information; top half, n = 100; bottom half, n = 500 (Chudy, 2016; Juslin, 2000; Lartillot et al., 2021, 2008; Peeters et al., 2011; Sethares, 2005; Tzanetakis and Cook, 2002)].
. | . | Expressed emotion . | Measurement context . | |
---|---|---|---|---|
. | . | Anxiety . | Terror . | . |
Acoustic features (measured using whole excerpts) | Event density | 2.176 | 3.190 | Number of sonic events per second |
(1.307) | (1.758) | |||
Loudness | 0.172 | 0.194 | Global energy of a signal; higher values indicate greater energy | |
(0.045) | (0.058) | |||
Loudness variability | 0.561 | 0.513 | Range of 0–1; closer to 1 indicates | |
(0.068) | (0.099) | greater loudness variability | ||
Mode | −0.012 | −0.024 | Between −1 and +1; closer to −1 = minor, closer to +1 = major | |
(0.094) | (0.093) | |||
Pulse clarity | 0.205 | 0.192 | Higher values indicate a clearer, stronger beat | |
(0.155) | (0.159) | |||
Tempo | 123.6 | 133.8 | Beats per minute (bpm) | |
(35.45) | (31.82) | |||
Acoustic features (measured across averaged fragments) | Brightness | 0.208 | 0.484 | Energy in upper frequencies |
(0.152) | (0.148) | |||
Roughness | 687.0 | 2146 | Higher values indicate greater sensory dissonance | |
(570.9) | (1561) | |||
Spectral centroid | 1425 | 2659 | Mean of the spectral distribution; reported in frequency (Hz) | |
(985.8) | (1054) | |||
Spectral flatness | 0.070 | 0.106 | Ratio indicating noisiness; closer to 1 indicates greater noisiness | |
(0.049) | (0.049) | |||
Spectral roll-off | 2497 | 5153 | Frequency (Hz); Higher values indicate more energy in upper frequencies | |
(2329) | (2128) | |||
Spectral spread | 2647 | 3107 | SD of spectral distribution; reported in frequency (Hz) | |
(833.4) | (756.9) | |||
Zero crossing rate | 408.7 | 1295 | Higher values indicate noisier signals | |
(346.5) | (856.7) |
. | . | Expressed emotion . | Measurement context . | |
---|---|---|---|---|
. | . | Anxiety . | Terror . | . |
Acoustic features (measured using whole excerpts) | Event density | 2.176 | 3.190 | Number of sonic events per second |
(1.307) | (1.758) | |||
Loudness | 0.172 | 0.194 | Global energy of a signal; higher values indicate greater energy | |
(0.045) | (0.058) | |||
Loudness variability | 0.561 | 0.513 | Range of 0–1; closer to 1 indicates | |
(0.068) | (0.099) | greater loudness variability | ||
Mode | −0.012 | −0.024 | Between −1 and +1; closer to −1 = minor, closer to +1 = major | |
(0.094) | (0.093) | |||
Pulse clarity | 0.205 | 0.192 | Higher values indicate a clearer, stronger beat | |
(0.155) | (0.159) | |||
Tempo | 123.6 | 133.8 | Beats per minute (bpm) | |
(35.45) | (31.82) | |||
Acoustic features (measured across averaged fragments) | Brightness | 0.208 | 0.484 | Energy in upper frequencies |
(0.152) | (0.148) | |||
Roughness | 687.0 | 2146 | Higher values indicate greater sensory dissonance | |
(570.9) | (1561) | |||
Spectral centroid | 1425 | 2659 | Mean of the spectral distribution; reported in frequency (Hz) | |
(985.8) | (1054) | |||
Spectral flatness | 0.070 | 0.106 | Ratio indicating noisiness; closer to 1 indicates greater noisiness | |
(0.049) | (0.049) | |||
Spectral roll-off | 2497 | 5153 | Frequency (Hz); Higher values indicate more energy in upper frequencies | |
(2329) | (2128) | |||
Spectral spread | 2647 | 3107 | SD of spectral distribution; reported in frequency (Hz) | |
(833.4) | (756.9) | |||
Zero crossing rate | 408.7 | 1295 | Higher values indicate noisier signals | |
(346.5) | (856.7) |
To test our hypotheses related to the non-timbral acoustic features, we used the six acoustic features as the predicted values for six mixed-effects linear regression models. The predictor values were the target emotions: terror and anxiety. The lengths of the excerpts (in seconds) were included as random slopes. Before running the analyses, we normalized all the resulting values of the six acoustic analyses. We report the regression analyses results in the upper half of Table IX and in the upper panel of Fig. 3. Consistent with our hypotheses, the results demonstrate a significant main effect between target emotions (1 = terror) and event density driven by a higher average event density for terrifying musical stimuli (M = 3.19, SD = 1.76) as compared to anxious musical stimuli (M = 2.18, SD = 1.31, p = 0.011). The results also demonstrate a significant main effect between target emotions and loudness variability driven by a lower average loudness variability for terrifying musical stimuli (M = 0.513, SD = 0.099) as compared to anxious musical stimuli (M = 0.561, SD = 0.068, p = 0.032), although this difference of only 0.048 (i.e., 4.8% more frames in which the energy of the signal is lower than average) may not be perceptually meaningful. There were no significant main effects between target emotions and loudness (p = 0.087), mode (p = 0.648), pulse clarity (p = 0.788), or tempo (p = 0.247). While in the predicted direction, the average tempo (in beats per minute, bpm) of terrifying musical stimuli (M = 133.8, SD = 31.82) was not significantly faster than anxious music (M = 123.6, SD = 35.45), inconsistent with our hypothesis. Similarly, terrifying music was not significantly louder (M = 0.194, SD = 0.058) than anxious music (M = 0.172, SD = 0.045), also inconsistent with our hypothesis. Both terrifying and anxious music exhibited modal ambiguity that leaned towards more minor modalities (anxious: M = –0.012, SD = 0.094; terror: M = –0.024, SD = 0.093), consistent with our hypotheses that both would tend towards minor modalities. They also had similarly irregular rhythmic structures signified by a low pulse clarity (anxious: M = 0.205, SD = 0.155; terror: M = 0.192, SD = 0.159), consistent with our hypotheses.
This table shows the results of our linear regression analyses testing how 13 acoustic features are predicted by the target emotions (anxiety, terror) of FEARMUS. The bold results are statistically significant where adjusted p-values < 0.05. The acoustic features in the top half of the table were measured across the full length of each excerpt. For the acoustic features on the lower half of the table, we randomly selected five 1-s-long segments from each excerpt and averaged across them. The results demonstrate that terrifying music has a significantly greater event density, more consistent loudness (lower loudness variability), and a noisier, brighter, and harsher timbre than anxious music, consistent with our hypotheses. However, terrifying music is not significantly faster in tempo or louder, inconsistent with our hypotheses. Both anxious and terrifying music use minor modes and have similarly inconsistent rhythmic structures, consistent with our hypotheses. n = 100.
. | . | Est. . | SE . | t . | Unadj. p . | Adj. p . | df . | . |
---|---|---|---|---|---|---|---|---|
Event density | ||||||||
Intercept Expressed emotion (1 = terror) | 0.207 | 0.028 | 7.333 | 2.37E−07 | 4.40E−07 | 22.081 | ||
0.142 | 0.048 | 2.985 | 8.74E-03 | 0.011 | 16.043 | |||
Loudness | ||||||||
Intercept Expressed emotion (1 = terror) | 0.375 | 0.030 | 12.480 | 6.17E−08 | 1.23E−07 | 11.246 | ||
0.098 | 0.052 | 1.880 | 0.077 | 0.087 | 17.654 | |||
Loudness variability | ||||||||
Intercept Expressed emotion (1 = terror) | 0.618 | 0.027 | 23.318 | 1.43E−14 | 9.30E−14 | 17.403 | ||
−0.095 | 0.040 | −2.393 | 0.027 | 0.032 | 18.619 | |||
Mode | ||||||||
Intercept Expressed emotion (1 = terror) | 0.533 | 0.033 | 15.912 | 7.88E−10 | 1.71E−09 | 12.842 | ||
−0.021 | 0.042 | −0.498 | 0.623 | 0.648 | 26.694 | |||
Pulse clarity | ||||||||
Intercept Expressed emotion (1 = terror) | 0.205 | 0.031 | 6.538 | 9.42E−05 | 1.29E−04 | 9.250 | ||
−0.012 | 0.045 | −0.275 | 0.788 | 0.788 | 13.643 | |||
Tempo | ||||||||
Intercept Expressed emotion (1 = terror) | 0.483 | 0.041 | 11.667 | 2.28E−06 | 3.95E−06 | 8.148 | ||
0.074 | 0.058 | 1.263 | 0.228 | 0.247 | 13.631 | |||
Est. | SE | t | Unadj. p | Adj. p | R2 | Adj. R2 | ||
Brightness | ||||||||
Intercept Expressed emotion (1 = terror) | 0.240 | 0.024 | 9.926 | <2E−16 | <1.73E−15 | |||
0.379 | 0.034 | 11.071 | <2E−16 | <1.73E−15 | 0.556 | 0.551 | ||
Roughness | ||||||||
Intercept Expressed emotion (1 = terror) | 0.086 | 0.020 | 4.308 | 3.91E−05 | 5.98E−05 | |||
0.213 | 0.028 | 7.523 | 2.61E−11 | 7.54E−11 | 0.366 | 0.360 | ||
Spectral centroid | ||||||||
Intercept Expressed emotion (1 = terror) | 0.200 | 0.025 | 8.110 | 1.47E−12 | 5.46E−12 | |||
0.249 | 0.035 | 7.153 | 1.55E−10 | 4.03E−10 | 0.343 | 0.336 | ||
Spectral flatness | ||||||||
Intercept Expressed emotion (1 = terror) | 0.200 | 0.025 | 8.175 | 1.07E−12 | 4.64E−12 | |||
0.148 | 0.035 | 4.253 | 4.82E−05 | 6.96E−05 | 0.156 | 0.147 | ||
Spectral roll-off | ||||||||
Intercept Expressed emotion (1 = terror) | 0.202 | 0.026 | 7.896 | 4.22E−12 | 1.37E−11 | |||
0.252 | 0.036 | 6.977 | 3.58E−10 | 8.46E−10 | 0.332 | 0.325 | ||
Spectral spread | ||||||||
Intercept Expressed emotion (1 = terror) | 0.329 | 0.027 | 12.012 | <2E−16 | <1.73E−15 | |||
0.132 | 0.039 | 3.405 | 9.59E−04 | 1.25E−03 | 0.106 | 0.097 | ||
Zero crossing rate | ||||||||
Intercept Expressed emotion (1 = terror) | 0.079 | 0.017 | 4.643 | 1.07E−05 | 1.74E−05 | |||
0.202 | 0.024 | 8.342 | 4.70E−13 | 2.44E−12 | 0.415 | 0.409 |
. | . | Est. . | SE . | t . | Unadj. p . | Adj. p . | df . | . |
---|---|---|---|---|---|---|---|---|
Event density | ||||||||
Intercept Expressed emotion (1 = terror) | 0.207 | 0.028 | 7.333 | 2.37E−07 | 4.40E−07 | 22.081 | ||
0.142 | 0.048 | 2.985 | 8.74E-03 | 0.011 | 16.043 | |||
Loudness | ||||||||
Intercept Expressed emotion (1 = terror) | 0.375 | 0.030 | 12.480 | 6.17E−08 | 1.23E−07 | 11.246 | ||
0.098 | 0.052 | 1.880 | 0.077 | 0.087 | 17.654 | |||
Loudness variability | ||||||||
Intercept Expressed emotion (1 = terror) | 0.618 | 0.027 | 23.318 | 1.43E−14 | 9.30E−14 | 17.403 | ||
−0.095 | 0.040 | −2.393 | 0.027 | 0.032 | 18.619 | |||
Mode | ||||||||
Intercept Expressed emotion (1 = terror) | 0.533 | 0.033 | 15.912 | 7.88E−10 | 1.71E−09 | 12.842 | ||
−0.021 | 0.042 | −0.498 | 0.623 | 0.648 | 26.694 | |||
Pulse clarity | ||||||||
Intercept Expressed emotion (1 = terror) | 0.205 | 0.031 | 6.538 | 9.42E−05 | 1.29E−04 | 9.250 | ||
−0.012 | 0.045 | −0.275 | 0.788 | 0.788 | 13.643 | |||
Tempo | ||||||||
Intercept Expressed emotion (1 = terror) | 0.483 | 0.041 | 11.667 | 2.28E−06 | 3.95E−06 | 8.148 | ||
0.074 | 0.058 | 1.263 | 0.228 | 0.247 | 13.631 | |||
Est. | SE | t | Unadj. p | Adj. p | R2 | Adj. R2 | ||
Brightness | ||||||||
Intercept Expressed emotion (1 = terror) | 0.240 | 0.024 | 9.926 | <2E−16 | <1.73E−15 | |||
0.379 | 0.034 | 11.071 | <2E−16 | <1.73E−15 | 0.556 | 0.551 | ||
Roughness | ||||||||
Intercept Expressed emotion (1 = terror) | 0.086 | 0.020 | 4.308 | 3.91E−05 | 5.98E−05 | |||
0.213 | 0.028 | 7.523 | 2.61E−11 | 7.54E−11 | 0.366 | 0.360 | ||
Spectral centroid | ||||||||
Intercept Expressed emotion (1 = terror) | 0.200 | 0.025 | 8.110 | 1.47E−12 | 5.46E−12 | |||
0.249 | 0.035 | 7.153 | 1.55E−10 | 4.03E−10 | 0.343 | 0.336 | ||
Spectral flatness | ||||||||
Intercept Expressed emotion (1 = terror) | 0.200 | 0.025 | 8.175 | 1.07E−12 | 4.64E−12 | |||
0.148 | 0.035 | 4.253 | 4.82E−05 | 6.96E−05 | 0.156 | 0.147 | ||
Spectral roll-off | ||||||||
Intercept Expressed emotion (1 = terror) | 0.202 | 0.026 | 7.896 | 4.22E−12 | 1.37E−11 | |||
0.252 | 0.036 | 6.977 | 3.58E−10 | 8.46E−10 | 0.332 | 0.325 | ||
Spectral spread | ||||||||
Intercept Expressed emotion (1 = terror) | 0.329 | 0.027 | 12.012 | <2E−16 | <1.73E−15 | |||
0.132 | 0.039 | 3.405 | 9.59E−04 | 1.25E−03 | 0.106 | 0.097 | ||
Zero crossing rate | ||||||||
Intercept Expressed emotion (1 = terror) | 0.079 | 0.017 | 4.643 | 1.07E−05 | 1.74E−05 | |||
0.202 | 0.024 | 8.342 | 4.70E−13 | 2.44E−12 | 0.415 | 0.409 |
Here, we show the results of the acoustic analyses comparing musically conveyed anxiety and terror, as measured using the FEARMUS database excerpts. In the upper panel we focus on non-timbral acoustic features. We analyzed these features using the entire length of each excerpt. The lower panel displays the results for the timbral acoustic features. For these features, we pseudo-randomly selected five 1-s-long segments from each of the 100 excerpts in FEARMUS to analyze and then average across for each excerpt. Consistent with our hypotheses, the results indicate that terrifying music has a brighter, harsher, noisier timbre, has more musical activity per second, and has less-variable loudness than anxious music. Furthermore, both subtypes of fearful music have relatively unclear rhythmic structures (low pulse clarity) and are mostly in minor modalities (see Table VIII for the unnormalized mean values). Inconsistent with our hypotheses, tempo and loudness are not statistically different between the two subtypes of fear. We had predicted that terrifying music would be louder and faster than anxious music, and while our results are in the predicted direction, they are not statistically significant.
Here, we show the results of the acoustic analyses comparing musically conveyed anxiety and terror, as measured using the FEARMUS database excerpts. In the upper panel we focus on non-timbral acoustic features. We analyzed these features using the entire length of each excerpt. The lower panel displays the results for the timbral acoustic features. For these features, we pseudo-randomly selected five 1-s-long segments from each of the 100 excerpts in FEARMUS to analyze and then average across for each excerpt. Consistent with our hypotheses, the results indicate that terrifying music has a brighter, harsher, noisier timbre, has more musical activity per second, and has less-variable loudness than anxious music. Furthermore, both subtypes of fearful music have relatively unclear rhythmic structures (low pulse clarity) and are mostly in minor modalities (see Table VIII for the unnormalized mean values). Inconsistent with our hypotheses, tempo and loudness are not statistically different between the two subtypes of fear. We had predicted that terrifying music would be louder and faster than anxious music, and while our results are in the predicted direction, they are not statistically significant.
To test our hypotheses related to timbre, we used standard general linear regression analyses. For each model, the predicted values were the timbral acoustic features and the predictor value was target emotion (terror = 1). We report the results of these analyses in the lower half of Table IX and in the lower panel of Fig. 3. We found a main effect for target emotion on all of the timbral acoustic features that we measured driven by higher values for terrifying music as compared to anxious music. Consistent with our hypothesis, terrifying music exhibited a brighter average timbre exhibited by a higher average brightness (p < 0.001; R2 = 0.556; adjusted R2 = 0.551), spectral centroid (p < 0.001; R2 = 0.343; adjusted R2 = 0.336), and spectral roll-off (p < 0.001; R2 = 0.332; adjusted R2 = 0.325) than anxious music. Additionally, also in line with our prediction, terrifying music exhibited a noisier and rougher average timbre than anxious music exhibited by a higher roughness (p < 0.001; R2 = 0.366; adjusted R2 = 0.360), spectral flatness (p < 0.001; R2 = 0.156; adjusted R2 = 0.147), spectral spread (p < 0.002; R2 = 0.106; adjusted R2 = 0.097), and zero crossing rate (p < 0.001; R2 = 0.415; adjusted R2 = 0.409).
IV. DISCUSSION AND CONCLUSIONS
Here, we report on how music differentially conveys two subtypes of fear: anxiety and terror. To research musically conveyed subtypes of fear, we created a custom database of musical excerpts called FEARMUS. To create the database, we first curated 180 excerpts from contemporary horror film soundtracks (90 portraying terror, and 90 portraying anxiety) using an expertise-based approach. We then validated the efficacy of the music at portraying the target emotions through an experiment during which participants rated the conveyed emotions of the excerpts with discrete and dimensional emotion scales. Next, we used the results of the rating experiment to filter the final database down the 100 most typical musical excerpts that portray terror and anxiety. We then applied descriptive and acoustic analyses to FEARMUS to outline the difference between these musically portrayed subtypes of fear.
The results of our behavioral ratings demonstrated that while terrifying and anxious music are quite differentiable on dimensional emotion scales, they are less clearly differentiable using discrete emotion scales. Consistent with our hypotheses, terrifying music was rated as conveying a lower valence and a higher arousal than anxious music. Furthermore, also consistent with our hypotheses, anxious music was rated as conveying a greater degree of anxiety than terror, and terrifying music was rated as conveying a greater degree of terror than anxious music. However, inconsistent with our predictions, terrifying music was rated as conveying a greater degree of anxiety than terror, and a greater degree of anxiety than the anxious musical excerpts. Overall, the evidence provides an inconclusive picture of the degree to which these subtypes of fear are perceptually differentiable in music.
It is worthwhile to consider what factors might be driving these conflicting results. For one, perhaps there was some confusion between portrayed and felt emotion during the ratings task (Schubert, 2013), despite our instructions to rate conveyed emotions. Notably, we did not instruct participants on the difference between felt and portrayed emotion. The terrifying musical excerpts may have induced anxious feelings in participants causing them to give those excerpts higher anxiety ratings. It also could be that subtypes of emotion are layered or overlapping. When something is conveying terror, perhaps that emotion is layered with a high degree of anxiety. A similar future ratings experiment might employ a forced-choice design to account for these possibilities.
Furthermore, it is also worthwhile to consider a couple of potential confounding factors in our design that may have affected the results of the behavioral rating experiment as well. First, while we recruited participants who could speak English well, we did not test their level of English comprehension nor document their native language. This confound may have affected the results of the rating experiment since language and culture hugely impact emotion perception and labelling (Barrett et al., 2011; Engelmann and Pogosyan, 2013; Ogarkova, 2016). Different languages contain vastly different numbers of emotion terms (Ogarkova, 2016). For example, Dutch has been found to contain 1501 emotion terms (Hoekstra, 1986; Ogarkova, 2016) while Czech only has 404 (Ogarkova, 2016; Slaměník et al., 2008). While our inclusion of emotion definitions may have helped to control for different perceptions of emotion across our participants, there still may have been some differences in their approach to the task linked to diverse native languages and cultural backgrounds. Furthermore, it could be that participants had varied emotional granularity capabilities, both in terms of perception and communication, which may have confounded our results. Participants with lower emotional granularity may have struggled more with the emotion rating task and provided less accurate ratings than participants with higher emotional granularity. Our study did not attempt to measure the emotional granularity capabilities of our participants.
The results of the descriptive and acoustic analyses provide more substantial evidence for the existence of at least two differentiable subtypes of musically conveyed fear. While some of their musical and acoustic features overlap, anxious and terrifying music display some striking sonic differences. Generally, terrifying music has a brighter, harsher, and rougher timbre, and is musically denser than anxious music. Anxious music has a greater degree of loudness variability than terrifying music. In terms of their similarities, both anxious and terrifying music tend towards minor modalities and are rhythmically unpredictable. Finally, while the descriptive analysis indicated that terrifying music is typically faster and louder than anxious music, the acoustic analyses results were inconsistent with this observation. Recall that terrifying music sometimes has no tempo at all but instead consists of static walls of noise. Perhaps such instances resulted in an average tempo that was not much faster than the anxious music excerpts. Regarding loudness, occasionally terrifying excerpts have long crescendos to intense climaxes. Perhaps such quieter beginnings resulted in a lower average loudness more comparable to the anxious musical excerpts as well. Possible explanations aside, it is interesting to compare these findings to the summary of Juslin (2019) of previous findings on musically conveyed fear and to the descriptions of McClelland (2012, 2014, 2017b) of ombra and tempesta. As highlighted in Table I, while Juslin (2019) describes fearful music as having fast tempi and low sound levels, McClelland (2012, 2014, 2017b) describes ombra as having slow-to-moderate tempi and tempesta as generally very loud. However, while we expected anxious and terrifying music to mirror ombra and tempesta in these critical differences, those expectations were not borne out in our results.
Overall, our results provoke the question of whether previous descriptions of musically conveyed fear [e.g., as summarized in Juslin (2019)] are adequate. Researching fear as a broad category in music cognition research may have produced overgeneralized accounts that conflate and under describe musically conveyed subtypes of fear. Contrastingly, our results align well with McClelland's (2012, 2014, 2017a,b) descriptions of ombra and tempesta. This finding demonstrates the benefit of integrating traditional music theories with psychological approaches in music cognition research. Such interdisciplinary approaches can uncover richer, better-informed portraits of how music functions.
In conclusion, it seems that there is indeed a strong sonic difference between at least two subtypes of musically conveyed fear. However, it is yet inconclusive as to whether these subtypes of fear are clearly perceptually distinguishable from one another. To better uncover how distinguishable subtypes of musically conveyed emotions (including terror and anxiety) are, it is vital that researchers incorporate emotional granularity into future experimental designs. It is essential that subtypes of emotions are considered overall to avoid overgeneralizations and incorrect conclusions about how music conveys emotion, although accounting for different languages and emotion constructs across cultures will be a challenging aspect of such future work.
ACKNOWLEDGMENTS
C.T. received funding from the European Union's Horizon 2020 research and innovation program under the Marie Skłodowska-Curie grant agreement (Grant No. 835682). S.F. received funding from Swiss National Science Foundation (Grant Nos. SNSF PP00P1_157409/1 and PP00P1_183711/1). Thanks to Arkady Konovalov and to the members of the Cognitive and Affective Neuroscience Laboratory at UZH for their feedback and support.
The other four of the five most researched emotions are sadness, happiness, anger, and tenderness/love (Juslin, 2019).
For a more extensive discussion on the potential impact of incorporating emotional granularity into music and emotion research, see Chap. 10 of Warrenburg, 2019b.
Topic Theory is used to define and catalogue instances in which certain combinations of musical features consistently and reliably communicate clear cultural associations and related emotions (Mirka, 2014; Ratner, 1980). For instance, one topic is the “march” topic which consists of brass instruments and drums, an upbeat even tempo, a major key, and exciting melodies (Ratner, 1980). The cultural associations with such a combination of musical features include celebrations, holidays, parades, military, and various emotions that are relevant to these associations such as joy, happiness, excitement, or pride.
For a table describing the musical features that convey ombra and tempesta, see McClelland's (2014) chapter “Ombra and Tempesta” in The Oxford Handbook of Topic Theory, pp. 279–300.
The experimenter also relied on the plot of the films to find these excerpts. Moments when antagonizing forces attacked the protagonists typically had music that matched the tempesta criteria, and moments of suspense typically had music matching the ombra criteria.
We elected to have each participant only rate one third of the excerpts due to time constraints.
The code first generated 35 lists containing the numbers 1–90 in a random order (e.g., 14, 32, 6, 87, 10, etc.). Then, each of those lists was split into thirds to create 105 shorter lists of 30 numbers. In doing so, each group of three lists encompassed all 90 excerpts in a random order with no overlap between those three lists. Those shorter lists then were used to index the list of audio files during the experiment. This procedure ensured that each participant heard a random third of the 90 excerpts and that each excerpt was rated by a third of the participants, evenly but randomly distributing the excerpts across the participants.
We based our definitions of the discrete emotions on entries in the online Merriam-Webster dictionary (Merriam-Webster, 2023). Sometimes we altered a word or two to make the definitions more accessible (e.g., “misfortune” instead of “ill”). The definitions given to participants were as follows: Happiness—a state of well-being and contentment, joy; Tenderness—a tender quality or condition, such as gentleness and affection; Anxiety—apprehensive uneasiness or nervousness usually over an impending or anticipated misfortune; and Terror—a state of intense or overwhelming fear. Additionally, we used the Self-Assessment Manikin (Bradley and Lang, 1994), flipped horizontally to match the directionality of the rating scales (negative-positive/low-high), to demonstrate the meaning of valence and arousal.
Forty-two participants took the GMSI in English, 63 in German. The results showed a mean score of 63.79 (SD = 18.63) with no outliers and a normal distribution. This score is well below the average norm (M = 81.58; SD = 20.64) validated by the authors of the GMSI (Müllensiefen et al., 2014), supporting our claim that our participants were non-musicians.
The other surveys that participants filled out consisted of the Autism Spectrum Quotient, Beck Depression Inventory, Big Five Inventory, Positive and Negative Affect Schedule, Levenson Self-Report Psychopathy Scale, and the State-Trait Anxiety Inventory. These surveys were part of a larger study. Therefore, we do not comment on the results of these surveys in this paper.
For detailed information on how each acoustic feature is extracted by the MIR Toolbox (Lartillot et al., 2007), see the User's Manual (version 1.8.1) (University of Jyväskylä, 2023).
See supplementary material at https://www.scitation.org/doi/suppl/10.1121/10.0016857 for Table S1 containing metadata, typicality indices, and ranking information for all 100 excerpts in the FEARMUS database.