Sarcasm detection presents unique challenges in speech technology, particularly for individuals with disorders that affect pitch perception or those lacking contextual auditory cues. While previous research [1, 2] has established the significance of pitch variation in sarcasm detection, these studies have primarily focused on singular modalities, often overlooking the potential synergies of integrating multimodal data. We propose an approach that synergizes auditory, textual, and emoticon data to enhance sarcasm detection. This involves augmenting sarcastic audio data with corresponding text using Automatic Speech Recognition (ASR), supplemented with emoticons based on emotion recognition and sentiment analysis. Emotional cues from multi-modal data are mapped to emoticons. Our methodology leverages the strengths of each modality: emotion recognition algorithms analyze the audio data for affective cues, while sentiment analysis processes the text generated from ASR. The integration of these modalities aims to compensate for limitations in pitch perception by providing complementary cues essential for accurate sarcasm interpretation. Our approach is expected to significantly improve sarcasm detection, especially for those with auditory processing challenges. This research highlights the potential of multimodal data fusion in enhancing the subtleties of speech perception and understanding, thus contributing to the advancement of speech technology applications.
Skip Nav Destination
Article navigation
March 2024
March 01 2024
Enhancing sarcasm detection through multimodal data integration: A proposal for augmenting audio with text and emoticon
Shekhar Nayak;
Shekhar Nayak
Campus Fryslân (Lang., Technol. and Culture), Univ. of Groningen, Leeuwarden,
Netherlands
Search for other works by this author on:
Matt Coler
Matt Coler
Campus Fryslân (Lang., Technol. and Culture), Univ. of Groningen, Wirdumerdijk 34, Leeuwarden 8911CE,
Netherlands
, m.coler@rug.nl
Search for other works by this author on:
J. Acoust. Soc. Am. 155, A264 (2024)
Connected Content
A companion article has been published:
Improving sarcasm detection from speech and text through attention-based fusion exploiting the interplay of emotions and sentiments
Citation
Xiyuan Gao, Shekhar Nayak, Matt Coler; Enhancing sarcasm detection through multimodal data integration: A proposal for augmenting audio with text and emoticon. J. Acoust. Soc. Am. 1 March 2024; 155 (3_Supplement): A264. https://doi.org/10.1121/10.0027441
Download citation file:
83
Views
Citing articles via
A survey of sound source localization with deep learning methods
Pierre-Amaury Grumiaux, Srđan Kitić, et al.
Short-time coherence between repeated room impulse response measurements
Karolina Prawda, Sebastian J. Schlecht, et al.
Efficient design of complex-valued neural networks with application to the classification of transient acoustic signals
Vlad S. Paul, Philip A. Nelson