Emotional information in speech is commonly described in terms of prosody features such as F0, duration, and energy. In this paper, the focus is on how F0 characteristics can be used to effectively parametrize emotional quality in speech signals. Using an analysis-by-synthesis approach, F0 mean, range, and shape properties of emotional utterances are systematically modified. The results show the aspects of the F0 parameter that can be modified without causing any significant changes in the perception of emotions. To model this behavior the concept of emotional regions is introduced. Emotional regions represent the variability present in the emotional speech and provide a new procedure for studying speech cues for judgments of emotion. The method is applied to F0 but can be also used on other aspects of prosody such as duration or loudness. Statistical analysis of the factors affecting the emotional regions, and discussion of the effects of F0 modifications on the emotion and speech quality perception are also presented. The results show that F0 range is more important than F0 mean for emotion expression.

1.
Banziger
,
T.
, and
Scherer
,
K. R.
(
2006
). “
The role of intonation in emotional expressions
,”
Speech Commun.
46
,
252
267
.
2.
Boersma
,
P.
, and
Weenink
,
D.
(
2007
). “
Praat: doing phonetics by computer (version 4.5.18) [computer program]
,” URL http://www.fon.hum.uva.nl/praat/, last retrieved March 9, 2008.
3.
Braun
,
B.
,
Kochanski
,
G.
,
Grabe
,
E.
, and
Rosner
,
B. S.
(
2006
). “
Evidence for attractors in English intonation
,”
J. Acoust. Soc. Am.
119
(
6
),
4006
4015
.
4.
Bulut
,
M.
,
Busso
,
C.
,
Yildirim
,
S.
,
Kazemzadeh
,
A.
,
Lee
,
C. M.
,
Lee
,
S.
, and
Narayanan
,
S.
(
2005
). “
Investigating the role of phoneme-level modifications in emotional speech resynthesis
,” in
Proc. of Eurospeech, Interspeech
, Lisbon, Portugal.
5.
Bulut
,
M.
,
Narayanan
,
S.
, and
Syrdal
,
A. K.
(
2002
). “
Expressive speech synthesis using a concatenative synthesizer
,” in
International Conference on Spoken Language Processing
, Denver.
6.
Burkhardt
,
F.
, and
Sendlmeier
,
W. F.
(
2000
). “
Verification of acoustical correlates of emotional speech using formant-synthesis
,” in
ISCA Workshop on Speech and Emotion
, New Castle, Northern Ireland, UK.
7.
Cahn
,
J. E.
(
1990
). “
The generation of affect in synthesized speech
,”
J. Am. Voice I/O Soci.
8
,
1
19
.
8.
Chu
,
M.
,
Zhao
,
Y.
, and
Chang
,
E.
(
2006
). “
Modeling stylized invariance and local variability of prosody in text-to-speech synthesis
,”
Speech Commun.
48
,
716
726
.
9.
Cowie
,
R.
,
Cowie
,
E. D.
,
Tsapatsoulis
,
N.
,
Votsis
,
G.
,
Kollias
,
S.
,
Fellenz
,
W.
, and
Taylor
,
J.
(
2001
). “
Emotion recognition in human-computer interaction
,”
IEEE Signal Process. Mag.
18
(
1
),
32
80
.
10.
Davitz
,
J. R.
(
1964
).
The Communication of Emotional Meaning
(
MCGraw–Hill
, New York).
11.
Duda
,
R. O.
,
Hart
,
P. E.
, and
Stork
,
D. G.
(
2001
).
Pattern Classification
, 2nd ed. (
Wiley-Interscience
, New York).
12.
Ekman
,
P.
, and
Friesen
,
W. V.
(
1977
).
Manual for the Facial Action Coding System
(
Consulting Psychologist Press
,
Palo Alto
).
13.
Grimm
,
M.
,
Mower
,
E.
,
Kroschel
,
K.
, and
Narayanan
,
S.
(
2007
). “
Primitives based estimation and evaluation of emotions in speech
,”
Speech Commun.
49
,
787
800
.
14.
Iida
,
A.
,
Campbell
,
N.
,
Higuchi
,
F.
, and
Yasumura
,
M.
(
2003
). “
A corpus-based speech synthesis system with emotion
,”
Speech Commun.
40
,
161
187
.
15.
Klabbers
,
E.
, and
van Santen
,
J. P. H.
(
2004
). “
Clustering of foot-based pitch contours in expressive speech
,” in
Proc. of the 5th ISCA Speech Synthesis Workshop
, Pittsburg.
16.
Ladd
,
D. R.
,
Silverman
,
K. E. A.
,
Tolkmitt
,
F.
,
Bergmann
,
G.
, and
Scherer
,
K. R.
(
1985
). “
Evidence for the independent function of intonation contour type, voice quality, and F0 range in signaling speaker affect
,”
J. Acoust. Soc. Am.
78
(
2
)
435
444
.
17.
Montero
,
J. M.
,
Gutierrez-Arriola
,
J.
,
Colas
,
J.
,
Enriquez
,
E.
, and
Pardo
,
J. M.
(
1999
). “
Analysis and modeling of emotional speech in Spanish
,” in
International Congress of Phonetic Sciences
, San Francisco.
18.
Moulines
,
E.
, and
Charpentier
,
F.
(
1990
). “
Pitch-synchronous waveform processing techniques for text-to-speech synthesis using diphones
,”
Speech Commun.
9
,
453
467
.
19.
Murray
,
I. R.
, and
Arnott
,
J. L.
(
1993
). “
Toward the simulation of emotion in synthetic speech: A review of the literature on human vocal emotion
,”
J. Acoust. Soc. Am.
93
1097
1108
.
20.
Pell
,
M. D.
(
2001
). “
Influence of emotion and focus location prosody in matched statements and questions
,”
J. Acoust. Soc. Am.
109
(
4
),
1668
1680
.
21.
Picard
,
R.
(
1997
).
Affective Computing
(
MIT Press
, Cambridge, MA).
22.
Picard
,
R. W.
,
Papert
,
S.
,
Bender
,
W.
,
Blumberg
,
B.
,
Breazeal
,
C.
,
Cavallo
,
D.
,
Machover
,
T.
,
Resnick
,
M.
,
Roy
,
D.
, and
Strohecker
,
C.
(
2004
). “
Affective learning – a manifesto
,”
BT Technol. J.
22
(
4
),
253
269
.
23.
Raux
,
A.
, and
Black
,
A.
(
2003
). “
A unit selection approach to F0 modeling and its application to emphasis
,” in
Proc. of ASRU
(St. Thomas, U.S. Virgin Islands).
24.
Roach
,
P.
(
2000
). “
Techniques for the phonetic description of emotional speech
,” in
ISCA Workshop on Speech and Emotion
, Newcastle. Northern Ireland, UK.
25.
Scherer
,
K. R.
(
2003
). “
Vocal communication of emotion: A review of research paradigms
,”
Speech Commun.
40
(
1–2
),
227
256
.
26.
Schlosberg
,
H.
(
1954
). “
Three dimensions of emotion
,”
Psychol. Rev.
61
,
81
88
.
27.
Schroder
,
M.
(
2001
). “
Emotional speech synthesis - a review
,” in
Eurospeech
(
Aalborg
,
Denmark
).
28.
Schroder
,
M.
,
Cowie
,
R.
,
Douglas-Cowie
,
E.
,
Westerdijk
,
M.
, and
Gielen
,
S.
(
2001
). “
Acoustic correlates of emotion dimensions in view of speech synthesis
,” in
Eurospeech
(
Aalborg
,
Denmark
).
29.
Silverman
,
K.
,
Beckman
,
M.
,
Pitrelli
,
J.
,
Ostendorf
,
M.
,
Wightman
,
C.
,
Price
,
P.
,
Pierre-humbert
,
J.
, and
Hirschberg
,
J.
(
1992
). “
ToBI: A standard for labeling English prosody
,” in
International Conference on Spoken Language Processing
,
Banff, Alberta, Canada
, pp.
867
870
.
30.
Taylor
,
P.
(
2000
). “
Analysis and synthesis of intonation using the tilt model
,”
J. Acoust. Soc. Am.
107
,
1697
1714
.
31.
Traunmuller
,
H.
(
2005
). “
Speech considered as modulated voice
,” URL http://www.ling.su.se/STAFF/hartmut/aktupub.htm, revised manuscript (last retrieved
March 9, 2008
).
32.
Yildirim
,
S.
,
Bulut
,
M.
,
Lee
,
C. M.
,
Kazemzadeh
,
A.
,
Busso
,
C.
,
Deng
,
Z.
,
Lee
,
S.
, and
Narayanan
,
S.
(
2004
). “
An acoustic study of emotions expressed in speech
,” in
International Conference on Spoken Language Processing
, Jeju, Korea.
You do not currently have access to this content.