Emotional information in speech is commonly described in terms of prosody features such as F0, duration, and energy. In this paper, the focus is on how F0 characteristics can be used to effectively parametrize emotional quality in speech signals. Using an analysis-by-synthesis approach, F0 mean, range, and shape properties of emotional utterances are systematically modified. The results show the aspects of the F0 parameter that can be modified without causing any significant changes in the perception of emotions. To model this behavior the concept of emotional regions is introduced. Emotional regions represent the variability present in the emotional speech and provide a new procedure for studying speech cues for judgments of emotion. The method is applied to F0 but can be also used on other aspects of prosody such as duration or loudness. Statistical analysis of the factors affecting the emotional regions, and discussion of the effects of F0 modifications on the emotion and speech quality perception are also presented. The results show that F0 range is more important than F0 mean for emotion expression.
Skip Nav Destination
June 01 2008
On the robustness of overall F0-only modifications to the perception of emotions in speech
Murtaza Bulut, Shrikanth Narayanan; On the robustness of overall F0-only modifications to the perception of emotions in speech. J. Acoust. Soc. Am. 1 June 2008; 123 (6): 4547–4558. https://doi.org/10.1121/1.2909562
Download citation file: