Vocal emotions are signaled by specific patterns of prosodic parameters, most notably pitch, phone duration, intensity, and phonation type. Phonation type was so far the least accessible parameter in emotion research, because it was difficult to extract from speech signals and difficult to manipulate in natural or synthetic speech. The present study built on recent advances in articulatory speech synthesis to exclusively control phonation type in re-synthesized German sentences spoken with seven different emotions. The goal was to find out to what extent the sole change of phonation type affects the perception of these emotions. Therefore, portrayed emotional utterances were re-synthesized with their original phonation type, as well as with each purely breathy, modal, and pressed phonation, and then rated by listeners with respect to the perceived emotions. Highly significant effects of phonation type on the recognition rates of the original emotions were found, except for disgust. While fear, anger, and the neutral emotion require specific phonation types for correct perception, sadness, happiness, boredom, and disgust primarily rely on other prosodic parameters. These results can help to improve the expression of emotions in synthesized speech and facilitate the robust automatic recognition of vocal emotions.

1.
Airas
,
M.
, and
Alku
,
P.
(
2006
). “
Emotions in vowel segments of continuous speech: Analysis of the glottal flow using the normalised amplitude quotient
,”
Phonetica
63
,
26
46
.
2.
Banse
,
R.
, and
Scherer
,
K. R.
(
1996
). “
Acoustic profiles in vocal emotion expression
,”
J. Personality Social Psychol.
70
,
614
636
.
3.
Barra-Chicote
,
R.
,
Yamagishia
,
J.
,
King
,
S.
,
Montero
,
J. M.
, and
Macias-Guarasa
,
J.
(
2010
). “
Analysis of statistical parametric and unit selection speech synthesis systems applied to emotional speech
,”
Speech Commun.
52
,
394
404
.
4.
Bartels
,
A.
(
2013
). “
Berlin Database of Emotional Speech
,” http://pascal.kgw.tu-berlin.de/emodb/ (Last viewed April 29, 2013).
5.
Birkholz
,
P.
(
2005
).
3D-Artikulatorische Sprachsynthese (3D-Articulatory Speech Synthesis
) (
Logos Verlag
,
Berlin)
,
161
pp.
6.
Birkholz
,
P.
(
2007
). “
Control of an articulatory speech synthesizer based on dynamic approximation of spatial articulatory targets
,” in
Proceedings of the Interspeech 2007—Eurospeech
, pp.
2865
2868
.
7.
Birkholz
,
P.
(
2011
). “
A survey of self-oscillating lumped-element models of the vocal folds
,” in
Studientexte zur Sprachkommunikation: Elektronische Sprachsignalverarbeitung 2011 (Study Texts for Voiced Communication: Electronic Speech Signal Processing 2011
), edited by
Kröger
,
B. J.
, and
Birkholz
,
P.
(
TUD Press
,
Dresden
), pp.
47
58
.
8.
Birkholz
,
P.
(
2013
). “
Modeling consonant-vowel coarticulation for articulatory speech synthesis
,”
PLoS ONE
8
,
e60603
.
9.
Birkholz
,
P.
, and
Jackèl
,
D.
(
2004
). “
Influence of temporal discretization schemes on formant frequencies and bandwidths in time domain simulations of the vocal tract system
,” in
Proceedings of the Interspeech 2004-ICSLP
, pp.
1125
1128
.
10.
Birkholz
,
P.
, and
Kröger
,
B. J.
(
2006
). “
Vocal tract model adaptation using magnetic resonance imaging
,” in
Proceedings of the 7th International Seminar on Speech Production (ISSP 2006)
, pp.
493
500
.
11.
Birkholz
,
P.
,
Kröger
,
B. J.
, and
Neuschaefer-Rube
,
C.
(
2011a
). “
Model-based reproduction of articulatory trajectories for consonant-vowel sequences
,”
IEEE Trans. Audio, Speech, Lang. Process.
19
,
1422
1433
.
12.
Birkholz
,
P.
,
Kröger
,
B. J.
, and
Neuschaefer-Rube
,
C.
(
2011b
). “
Synthesis of breathy, normal, and pressed phonation using a two-mass model with a triangular glottis
,” in
Proceedings of the Interspeech 2011
, pp.
2681
2684
.
13.
Breitenstein
,
C.
,
Lancker
,
D. V.
, and
Daum
,
I.
(
2001
). “
The contribution of speech rate and pitch variation to the perception of vocal emotions in a German and an American sample
,”
Cognit. Emotion
15
,
57
79
.
14.
Burkhardt
,
F.
(
2009
). “
Rule-based voice quality variation with formant synthesis
,” in
Proceedings of the Interspeech 2009
, pp.
2659
2662
.
15.
Burkhardt
,
F.
,
Paeschke
,
A.
,
Rolfes
,
M.
,
Sendlmeier
,
W.
, and
Weiss
,
B.
(
2005
). “
A database of German emotional speech
,” in
Proceedings of the Interspeech 2005
, pp.
1517
1520
.
16.
Burkhardt
,
F.
, and
Sendlmeier
,
W. F.
(
2000
). “
Verification of acoustical correlates of emotional speech using formant-synthesis
,” in
Proceedings of the ISCA Workshop on Speech and Emotion
, pp.
151
156
.
17.
Campbell
,
N.
, and
Mokhtari
,
P.
(
2003
). “
Voice quality: The 4th prosodic dimension
,” in
Proceedings of the 15th International Congress of Phonetic Sciences (ICPhS 2003)
, pp.
2417
2420
.
18.
Chuenwattanapranithi
,
S.
,
Xu
,
Y.
,
Thipakorn
,
B.
, and
Maneewongvatana
,
S.
(
2008
). “
Encoding emotions in speech with the size code—A perceptual investigation
,”
Phonetica
65
,
210
230
.
19.
Erath
,
B. D.
,
Zañartu
,
M.
,
Stewart
,
K. C.
,
Plesniak
,
M. W.
,
Sommer
,
D. E.
, and
Peterson
,
S. D.
(
2013
). “
A review of lumped-element models of voiced speech
,”
Speech Commun.
55
,
667
690
.
20.
Gobl
,
C.
, and
Ní Chasaide
,
A.
(
2003
). “
The role of voice quality in communicating emotion, mood and attitude
,”
Speech Commun.
40
,
189
212
.
21.
Gordon
,
M.
, and
Ladefoged
,
P.
(
2001
). “
Phonation types: A cross-linguistic overview
,”
J. Phonet.
29
,
383
406
.
22.
Ishizaka
,
K.
, and
Flanagan
,
J. L.
(
1972
). “
Synthesis of voiced sounds from a two-mass model of the vocal cords
,”
Bell Syst. Tech. J.
51
,
1233
1268
.
23.
Johnstone
,
T.
, and
Scherer
,
K. R.
(
1999
). “
The effects of emotions on voice quality
,” in
Proceedings of the XIVth International Congress of Phonetic Sciences (ICPhS 1999)
, pp.
2029
2032
.
24.
Kane
,
J.
, and
Gobl
,
C.
(
2011
). “
Identifying regions of non-modal phonation using features of the wavelet transform
,” in
Proceedings of the Interspeech 2011
, pp.
177
180
.
25.
Klatt
,
D. H.
(
1980
). “
Software for a cascade/parallel formant synthesizer
,”
J. Acoust. Soc. Am.
67
,
971
995
.
26.
Kröger
,
B. J.
, and
Birkholz
,
P.
(
2007
). “
A gesture–based concept for speech movement control in articulatory speech synthesis
,” in
Verbal and Nonverbal Communication Behaviours
, LNAI4775, edited by
A.
Esposito
,
M.
Faundez-Zanuy
,
E.
Keller
, and
M.
Marinaro
(
Springer Verlag
,
Berlin)
, pp.
174
189
.
27.
Laukka
,
P.
(
2005
). “
Categorical perception of vocal emotion expressions
,”
Emotion
5
,
277
295
.
28.
Laukka
,
P.
,
Juslin
,
P.
, and
Bresin
,
R.
(
2005
). “
A dimensional approach to vocal expression of emotion
,”
Cognit. Emotion
19
,
633
653
.
29.
Laukkanen
,
A. M.
,
Vilkman
,
E.
,
Alku
,
P.
, and
Oksanen
,
H.
(
1997
). “
On the perception of emotions in speech: The role of voice quality
,”
Logoped. Phoniatr. Vocol.
22
,
157
168
.
30.
Laver
,
J.
(
1980
).
The Phonetic Description of Voice Quality
(
Cambridge University Press
,
London
),
186
pp.
31.
Montero
,
J. M.
,
Gutiérrez-Arriola
,
J.
,
Colás
,
J.
,
Enríquez
,
E.
, and
Pardo
,
J. M.
(
1999
). “
Analysis and modelling of emotional speech in Spanish
,” in
Proceedings of the XIVth International Congress of Phonetic Sciences (ICPhS 1999)
, Vol.
2
, pp.
957
960
.
32.
Murphy
,
P. J.
, and
Laukkanen
,
A. M.
(
2009
). “
Electroglottogram analysis of emotionally styled phonation
,” in
Multimodal Signals: Cognitive and Algorithmic Issues
, edited by
A.
Esposito
,
A.
Hussain
,
M.
Marinaro
, and
R.
Martone
(
Springer
,
Berlin
), pp.
264
270
.
33.
Murray
,
I. R.
, and
Arnott
,
J. L.
(
1993
). “
Toward the simulation of emotion in synthetic speech: A review of the literature on human vocal emotion
,”
J. Acoust. Soc. Am.
93
,
1097
1108
.
34.
Patel
,
S.
,
Scherer
,
K. R.
,
Björkner
,
E.
, and
Sundberg
,
J.
(
2011
). “
Mapping emotions into acoustic space: The role of voice production
,”
Biol. Psychol.
87
,
93
98
.
35.
Prom-On
,
S.
,
Xu
,
Y.
, and
Thipakorn
,
B.
(
2009
). “
Modeling tone and intonation in Mandarin and English as a process of target approximation
,”
J. Acoust. Soc. Am.
125
,
405
424
.
36.
Regner
,
M. F.
,
Tao
,
C.
,
Ying
,
D.
,
Olszewski
,
A.
,
Zhang
,
Y.
, and
Jiang
,
J. J.
(
2012
). “
The effect of vocal fold adduction on the acoustic quality of phonation: Ex vivo investigations
,”
J. Voice
26
,
698
705
.
37.
Scherer
,
K. R.
(
1986
). “
Vocal affect expression: A review and a model for future research
,”
Psychol. Bull.
99
,
143
165
.
38.
Scherer
,
K. R.
(
2003
). “
Vocal communication of emotion: A review of research paradigms
,”
Speech Commun.
40
,
227
256
.
39.
Scherer
,
R. C.
, and
Titze
,
I. R.
(
1987
). “
The abduction quotient related to vocal quality
,”
J. Voice
1
,
246
251
.
40.
Schröder
,
M.
(
1999
). “
Zur Machbarkeit von Synthese emotionaler Sprache ohne Modellierung der Stimmqualität” (“On the feasibility of emotional speech synthesis without modeling voice quality”)
, in
Studientexte zur Sprachkommunikation: Elektronische Sprachsignalverarbeitung 1999 (Study Texts for Voice Communication: Electronic Speech Signal Processing 1999
), edited by
D.
Mehnert
(
TUD Press
,
Dresden
), pp.
222
229
.
41.
Schröder
,
M.
(
2001
). “
Emotional speech synthesis: A review
,” in
Proceedings of the Interspeech 2001
, pp.
561
564
.
42.
Schröder
,
M.
,
Burkhardt
,
F.
, and
Krstulovic
,
S.
(
2010
). “
Synthesis of emotional speech
,” in
Blueprint for affective computing
, edited by
K. R.
Scherer
,
T.
Bänziger
, and
E.
Roesch
(
Oxford University Press
,
London
), pp.
222
231
.
43.
Story
,
B. H.
, and
Titze
,
I. R.
(
1995
). “
Voice simulation with a body-cover model of the vocal folds
,”
J. Acoust. Soc. Am.
97
,
1249
1260
.
44.
Sundberg
,
J.
,
Patel
,
S.
,
Björkner
,
E.
, and
Scherer
,
K. R.
(
2011
). “
Interdependencies among voice source parameters in emotional speech
,”
IEEE Trans. Affective Comput.
2
,
162
174
.
45.
Waaramaa
,
T.
, and
Kankare
,
E.
(
2013
). “
Acoustic and EGG analyses of emotional utterances
,”
Logoped. Phoniatr. Vocol.
38
,
11
18
.
46.
Waaramaa
,
T.
,
Laukkanen
,
A. M.
,
Alku
,
P.
, and
Väyrynen
,
E.
(
2008
). “
Monopitched expression of emotions in different vowels
,”
Folia Phoniatr. Logopaed.
60
,
249
255
.
47.
Xu
,
Y.
,
Kelly
,
A.
, and
Smillie
,
C.
(
2013
). “
Emotional expressions as communicative signals
,” in
Prosody and Iconicity
, edited by
S.
Hancil
and
D.
Hirst
(
JohnBenjamins Publishing Co.
,
Amsterdam
), pp.
33
60
.
48.
Yanushevskaya
,
I.
,
Gobl
,
C.
, and
Ní Chasaide
,
A.
(
2013
). “
Voice quality in affect cueing: Does loudness matter?
,”
Front. Psychol.
4
,
335
.
You do not currently have access to this content.