Mental health illnesses like Major Depressive Disorder and Schizophrenia affect the coordination between articulatory gestures in speech production. Coordination features derived from Vocal tract variables (TVs) predicted by a speech inversion system can quantify the changes in articulatory gestures and have proven to be effective in the classification of mental health disorders. In this study we use data from the IEMOCAP (acted emotions) and MSP Podcast (natural emotions) datasets to understand how coordination features extracted from TVs can be used to capture changes between different emotions for the first time. We compared the eigenspectra extracted from channel delay correlation matrices for Angry, Sad and Happy emotions with respect to the “Neutral” emotion. Across both the datasets, it was observed that the “Sad” emotion follows a pattern suggesting simpler articulatory coordination while the “Angry” emotion follows the opposite showing signs of complex articulatory coordination. For the majority of subjects, the ‘Happy’ emotion follows a complex articulatory coordination pattern, but has significant confusion with “Neutral” emotion. We trained a Convolutional Neural Network with the coordination features as inputs to perform emotion classification. A detailed interpretation of the differences in eigenspectra and the results of the classification experiments will be discussed.
Skip Nav Destination
Article navigation
October 2021
Meeting abstract. No PDF available.
October 01 2021
Emotion recognition with speech articulatory coordination features
Yashish M. Siriwardena;
Yashish M. Siriwardena
Elec. and Comput. Eng., Univ. of Maryland College Park, 8223 Paint Branch Dr., College Park, MD 20742, yashish@terpmail.umd.edu
Search for other works by this author on:
Carol Espy-Wilson
Carol Espy-Wilson
Elec. and Comput. Eng., Univ. of Maryland College Park, College Park, MD
Search for other works by this author on:
J. Acoust. Soc. Am. 150, A358 (2021)
Citation
Yashish M. Siriwardena, Nadee Seneviratne, Carol Espy-Wilson; Emotion recognition with speech articulatory coordination features. J. Acoust. Soc. Am. 1 October 2021; 150 (4_Supplement): A358. https://doi.org/10.1121/10.0008586
Download citation file:
69
Views
Citing articles via
A survey of sound source localization with deep learning methods
Pierre-Amaury Grumiaux, Srđan Kitić, et al.
Co-speech head nods are used to enhance prosodic prominence at different levels of narrow focus in French
Christopher Carignan, Núria Esteve-Gibert, et al.
Source and propagation modelling scenarios for environmental impact assessment: Model verification
Michael A. Ainslie, Robert M. Laws, et al.
Related Content
A contemporary approach for emotion recognition using deep learning techniques from IEMOCAP multimodal emotion dataset
AIP Conf. Proc. (March 2024)
Speech emotion recognition based on transfer learning from the FaceNet framework
J. Acoust. Soc. Am. (February 2021)
Classifying the emotional speech content of participants in group meetings using convolutional long short-term memory network
J. Acoust. Soc. Am. (February 2021)
A comparison study of widespread CNN architectures for speech emotion recognition on spectrogram
AIP Conference Proceedings (June 2022)
A kinematic study of critical and non-critical articulators in emotional speech production
J. Acoust. Soc. Am. (March 2015)