Mental health illnesses like Major Depressive Disorder and Schizophrenia affect the coordination between articulatory gestures in speech production. Coordination features derived from Vocal tract variables (TVs) predicted by a speech inversion system can quantify the changes in articulatory gestures and have proven to be effective in the classification of mental health disorders. In this study we use data from the IEMOCAP (acted emotions) and MSP Podcast (natural emotions) datasets to understand how coordination features extracted from TVs can be used to capture changes between different emotions for the first time. We compared the eigenspectra extracted from channel delay correlation matrices for Angry, Sad and Happy emotions with respect to the “Neutral” emotion. Across both the datasets, it was observed that the “Sad” emotion follows a pattern suggesting simpler articulatory coordination while the “Angry” emotion follows the opposite showing signs of complex articulatory coordination. For the majority of subjects, the ‘Happy’ emotion follows a complex articulatory coordination pattern, but has significant confusion with “Neutral” emotion. We trained a Convolutional Neural Network with the coordination features as inputs to perform emotion classification. A detailed interpretation of the differences in eigenspectra and the results of the classification experiments will be discussed.
Skip Nav Destination
Article navigation
Meeting abstract. No PDF available.
October 01 2021
Emotion recognition with speech articulatory coordination features
Yashish M. Siriwardena;
Yashish M. Siriwardena
Elec. and Comput. Eng., Univ. of Maryland College Park, 8223 Paint Branch Dr., College Park, MD 20742, yashish@terpmail.umd.edu
Search for other works by this author on:
Carol Espy-Wilson
Carol Espy-Wilson
Elec. and Comput. Eng., Univ. of Maryland College Park, College Park, MD
Search for other works by this author on:
J. Acoust. Soc. Am. 150, A358 (2021)
Citation
Yashish M. Siriwardena, Nadee Seneviratne, Carol Espy-Wilson; Emotion recognition with speech articulatory coordination features. J. Acoust. Soc. Am. 1 October 2021; 150 (4_Supplement): A358. https://doi.org/10.1121/10.0008586
Download citation file:
14
Views
Citing articles via
Related Content
Speech emotion recognition based on transfer learning from the FaceNet framework
J. Acoust. Soc. Am. (February 2021)
Classifying the emotional speech content of participants in group meetings using convolutional long short-term memory network
J. Acoust. Soc. Am. (February 2021)
A comparison study of widespread CNN architectures for speech emotion recognition on spectrogram
AIP Conference Proceedings (June 2022)
Deep learning for emotional speech recognition
AIP Conference Proceedings (December 2020)
Angry prosody slows responses to simple commands
J Acoust Soc Am (March 2019)