Prosody is an important aspect of speech. In spoken language processing, effective prosody modeling helps to identify additional information beyond words (i.e., ‘‘how it is said’’ instead of ‘‘what is said’’) and thus better understand speech. In this talk, we will discuss how prosodic information is utilized in various speech processing tasks. Prosodic features are extracted to represent duration, pitch, and energy, with different normalization, and modeled using machine learning techniques. Research has shown that prosody provides valuable information for tasks such as automatic identification of important events in spoken language (e.g., sentence boundaries or punctuation, disfluencies, discourse markers, topics, and emotions in dialog). These phenomena are important for enriching speech recognition output and helping downstream language processing modules. Modeling prosodic variation across speakers is also useful for these tasks, as well as for developing speaker recognition systems. Additionally, some issues in machine learning techniques in prosody modeling will be discussed. Understanding how prosody is used to signal interesting events in speech will help to build better synthesis models for generating more natural and expressive speech.
Skip Nav Destination
Article navigation
November 2006
Meeting abstract. No PDF available.
November 01 2006
Modeling prosody in speech processing Free
Yang Liu
Yang Liu
Computer Sci. Dept., Univ. of Texas at Dallas, MS EC‐31, Box 83068, Richardson, TX 75083‐0688
Search for other works by this author on:
Yang Liu
Computer Sci. Dept., Univ. of Texas at Dallas, MS EC‐31, Box 83068, Richardson, TX 75083‐0688
J. Acoust. Soc. Am. 120, 3006 (2006)
Citation
Yang Liu; Modeling prosody in speech processing. J. Acoust. Soc. Am. 1 November 2006; 120 (5_Supplement): 3006. https://doi.org/10.1121/1.4787018
Download citation file:
Citing articles via
Focality of sound source placement by higher (ninth) order ambisonics and perceptual effects of spectral reproduction errors
Nima Zargarnezhad, Bruno Mesquita, et al.
A survey of sound source localization with deep learning methods
Pierre-Amaury Grumiaux, Srđan Kitić, et al.
Variation in global and intonational pitch settings among black and white speakers of Southern American English
Aini Li, Ruaridh Purse, et al.
Related Content
Interaction between prosody and discourse structure in a simulated man–machine dialogue
J. Acoust. Soc. Am. (November 1997)
Hierarchical prosody modeling for Mandarin spontaneous speech
J. Acoust. Soc. Am. (April 2019)
Concept‐to‐speech conversion for reply speech generation in a spoken dialogue system for road guidance and its prosodic control
J. Acoust. Soc. Am. (November 2006)
Linking speech and language processing through prosody
J. Acoust. Soc. Am. (May 1994)
Realization of rhythmic dialogue on spoken dialogue system using paralinguistic information
J. Acoust. Soc. Am. (November 2006)