Sarcasm detection presents unique challenges in speech technology, particularly for individuals with dis-orders that affect pitch perception or those lacking contextual auditory cues. While previous research has established the significance of integrating textual, audio and visual data in sarcasm detection, these studies overlook the interactions between modalities. We propose an approach that synergizes audio, textual, sentiment and emotion data to enhance sarcasm detection. This involves augmenting sarcastic audio with corresponding text using Automatic Speech Recognition (ASR), supplemented with information based on emotion recognition and sentiment analysis. Our methodology leverages the strengths of each modality: emotion recognition algorithms analyze the audio data for affective cues, while sentiment analysis processes the text generated from ASR. The integration of these modalities aims to compensate for limitations in current multimodal approach by providing complementary cues essential for accurate sarcasm interpretation. Evaluated on only the audio data of the dataset MUStARD++, our approach has surpassed the state-of-the-art model by 4.79,% F1-score. Our approach improves sarcasm detection in the audio domain, especially beneficial to those with auditory processing challenges. This research highlights the potential of multimodal data fusion in enhancing the subtleties of speech perception and understanding, thus contributing to the advancement of speech technology applications.
Skip Nav Destination
Article navigation
13 May 2024
186th Meeting of the Acoustical Society of America and the Canadian Acoustical Association
13–17 May 2024
Ottawa, Ontario, Canada
Speech Communication: Communication Paper 4aPP8
July 31 2024
Improving sarcasm detection from speech and text through attention-based fusion exploiting the interplay of emotions and sentiments
Xiyuan Gao
;
Xiyuan Gao
1
Department of Language Technology and Culture, Campus Fryslân, Rijksuniversiteit Groningen
, Leeuwarden, Fryslân, 8911 CE, NETHERLANDS
; xiyuan.gao@rug.nl
Search for other works by this author on:
Shekhar Nayak;
Shekhar Nayak
3
Department of Language Technology and Culture, Campus Fryslân, Rijksuniversiteit Groningen
, Leeuwarden, Fryslân, 8911 CE, NETHERLANDS
; s.nayak@rug.nl
Search for other works by this author on:
Matt Coler
Matt Coler
3
Department of Language Technology and Culture, Campus Fryslân, Rijksuniversiteit Groningen
, Leeuwarden, Fryslân, 8911 CE, NETHERLANDS
; m.coler@rug.nl
Search for other works by this author on:
Proc. Mtgs. Acoust. 54, 060002 (2024)
Article history
Received:
June 17 2024
Accepted:
July 05 2024
Connected Content
Citation
Xiyuan Gao, Shekhar Nayak, Matt Coler; Improving sarcasm detection from speech and text through attention-based fusion exploiting the interplay of emotions and sentiments. Proc. Mtgs. Acoust. 13 May 2024; 54 (1): 060002. https://doi.org/10.1121/2.0001918
Download citation file:
147
Views
Citing articles via
Initial comparison of a Falcon-9 reentry sonic boom with other launch-related noise
Jeffrey Taggart Durrant, Mark C. Anderson, et al.