Automatic inference of paralinguistic information from speech, such as age, is an important area of research with many technological applications. Speaker age estimation can help with age-appropriate curation of information content and personalized interactive experiences. However, automatic speaker age estimation in children is challenging due to the paucity of speech data representing the developmental spectrum, and the large signal variability including within a given age group. Most prior approaches in child speaker age estimation adopt methods directly drawn from research on adult speech. In this paper, we propose a novel technique that exploits temporal variability present in children's speech for estimation of children's age. We focus on phone durations as biomarker of children's age. Phone duration distributions are derived by forced-aligning children's speech with transcripts. Regression models are trained to predict speaker age among children studying in kindergarten up to grade 10. Experiments on two children's speech datasets are used to demonstrate the robustness and portability of proposed features over multiple domains of varying signal conditions. Phonemes contributing most to estimation of children speaker age are analyzed and presented. Experimental results suggest phone durations contain important development-related information of children. The proposed features are also suited for application under low data scenarios.
Skip Nav Destination
Article navigation
November 2022
November 15 2022
Phone duration modeling for speaker age estimation in children
Prashanth Gurunath Shivakumar
;
Prashanth Gurunath Shivakumar
a)
1
Department of Electrical and Computer Engineering, University of Southern California
, Los Angeles, California 90089, USA
Search for other works by this author on:
Somer Bishop;
Somer Bishop
2
Department of Psychiatry, University of California
, San Francisco, California 94143, USA
Search for other works by this author on:
Catherine Lord;
Catherine Lord
3
Semel Institute of Neuroscience and Human Behavior, University of California
, Los Angeles, California 90095, USA
Search for other works by this author on:
Shrikanth Narayanan
Shrikanth Narayanan
1
Department of Electrical and Computer Engineering, University of Southern California
, Los Angeles, California 90089, USA
Search for other works by this author on:
a)
Electronic mail: pgurunat@usc.edu
J. Acoust. Soc. Am. 152, 3000–3009 (2022)
Article history
Received:
February 27 2022
Accepted:
October 29 2022
Citation
Prashanth Gurunath Shivakumar, Somer Bishop, Catherine Lord, Shrikanth Narayanan; Phone duration modeling for speaker age estimation in children. J. Acoust. Soc. Am. 1 November 2022; 152 (5): 3000–3009. https://doi.org/10.1121/10.0015198
Download citation file:
Sign in
Don't already have an account? Register
Sign In
You could not be signed in. Please check your credentials and make sure you have an active account and try again.
Sign in via your Institution
Sign in via your InstitutionPay-Per-View Access
$40.00
88
Views