When pitch is explicitly modelled for parametric speech synthesis, microprosodic variations of the fundamental frequency f0 are usually disregarded by current intonation models. While there are numerous studies dealing with the nature and the origin of microprosody, little research has been done on its audibility and its effect on the naturalness of synthetic speech. In this work, the influence of obstruent-related microprosodic variations on the perceived naturalness of articulatory speech synthesis was studied. A small corpus of 20 German words and sentences was re-synthesized using the state-of-the-art articulatory synthesizer VocalTractLab. The pitch contours of the real utterances were extracted and fitted with the Target-Approximation-Model. After the real microprosodic variations were removed from the obtained pitch contours, synthetic variations were applied based on a microprosody model. Subsequently, multiple stimuli with different microprosody amplitudes were synthesized and evaluated in a listening experiment. The results indicate that microprosodic variations are barely audible, but can lead to a greater perceived naturalness of the synthesized speech in certain cases.
Skip Nav Destination
,
,
,
,
,
Article navigation
August 2021
August 17 2021
Modelling microprosodic effects can lead to an audible improvement in articulatory synthesis Available to Purchase
Paul Konstantin Krug;
Paul Konstantin Krug
a)
1
Institute of Acoustics and Speech Communication, Technische Universität Dresden
, Germany
Search for other works by this author on:
Branislav Gerazov;
Branislav Gerazov
2
Faculty of Electrical Engineering and Information Technologies, Ss. Cyril and Methodius University in Skopje
, Republic of North Macedonia
Search for other works by this author on:
Daniel R. van Niekerk;
Daniel R. van Niekerk
3
Department of Speech, Hearing and Phonetic Sciences, University College London
, United Kingdom
Search for other works by this author on:
Anqi Xu
;
Anqi Xu
b)
3
Department of Speech, Hearing and Phonetic Sciences, University College London
, United Kingdom
Search for other works by this author on:
Yi Xu;
Yi Xu
3
Department of Speech, Hearing and Phonetic Sciences, University College London
, United Kingdom
Search for other works by this author on:
Peter Birkholz
Peter Birkholz
1
Institute of Acoustics and Speech Communication, Technische Universität Dresden
, Germany
Search for other works by this author on:
Paul Konstantin Krug
1,a)
Branislav Gerazov
2
Daniel R. van Niekerk
3
Anqi Xu
3,b)
Yi Xu
3
Peter Birkholz
1
1
Institute of Acoustics and Speech Communication, Technische Universität Dresden
, Germany
2
Faculty of Electrical Engineering and Information Technologies, Ss. Cyril and Methodius University in Skopje
, Republic of North Macedonia
3
Department of Speech, Hearing and Phonetic Sciences, University College London
, United Kingdom
a)
Electronic mail: [email protected]
b)
ORCID: 0000-0002-4331-6676.
J. Acoust. Soc. Am. 150, 1209–1217 (2021)
Article history
Received:
February 19 2021
Accepted:
July 22 2021
Citation
Paul Konstantin Krug, Branislav Gerazov, Daniel R. van Niekerk, Anqi Xu, Yi Xu, Peter Birkholz; Modelling microprosodic effects can lead to an audible improvement in articulatory synthesis. J. Acoust. Soc. Am. 1 August 2021; 150 (2): 1209–1217. https://doi.org/10.1121/10.0005876
Download citation file:
Pay-Per-View Access
$40.00
Sign In
You could not be signed in. Please check your credentials and make sure you have an active account and try again.
Citing articles via
Focality of sound source placement by higher (ninth) order ambisonics and perceptual effects of spectral reproduction errors
Nima Zargarnezhad, Bruno Mesquita, et al.
A survey of sound source localization with deep learning methods
Pierre-Amaury Grumiaux, Srđan Kitić, et al.
Variation in global and intonational pitch settings among black and white speakers of Southern American English
Aini Li, Ruaridh Purse, et al.
Related Content
The ‘‘listener’’ in the modeling of speech prosody
J. Acoust. Soc. Am. (May 2004)
Semi‐automatic labeling of pitch accents in American English
J. Acoust. Soc. Am. (November 2006)
The effect of macroprososdic context on consonant perturbations of fundamental frequency
J. Acoust. Soc. Am. (August 2005)
On the use of F0 variations in automatic speech recognition
J. Acoust. Soc. Am. (August 2005)
F0 and the voicing states of initial consonants in Mandarin
J. Acoust. Soc. Am. (May 2006)