Mental health illnesses like Major Depressive Disorder and Schizophrenia affect the coordination between articulatory gestures in speech production. Coordination features derived from Vocal tract variables (TVs) predicted by a speech inversion system can quantify the changes in articulatory gestures and have proven to be effective in the classification of mental health disorders. In this study we use data from the IEMOCAP (acted emotions) and MSP Podcast (natural emotions) datasets to understand how coordination features extracted from TVs can be used to capture changes between different emotions for the first time. We compared the eigenspectra extracted from channel delay correlation matrices for Angry, Sad and Happy emotions with respect to the “Neutral” emotion. Across both the datasets, it was observed that the “Sad” emotion follows a pattern suggesting simpler articulatory coordination while the “Angry” emotion follows the opposite showing signs of complex articulatory coordination. For the majority of subjects, the ‘Happy’ emotion follows a complex articulatory coordination pattern, but has significant confusion with “Neutral” emotion. We trained a Convolutional Neural Network with the coordination features as inputs to perform emotion classification. A detailed interpretation of the differences in eigenspectra and the results of the classification experiments will be discussed.
Skip Nav Destination
Meeting abstract. No PDF available.
October 01 2021
Emotion recognition with speech articulatory coordination features
Yashish M. Siriwardena;
J. Acoust. Soc. Am. 150, A358 (2021)
Yashish M. Siriwardena, Nadee Seneviratne, Carol Espy-Wilson; Emotion recognition with speech articulatory coordination features. J. Acoust. Soc. Am. 1 October 2021; 150 (4_Supplement): A358. https://doi.org/10.1121/10.0008586
Download citation file: