Speech technology advancements have progressed significantly in the last decade, yet major research challenges continue to impact effective advancements for diarization in naturalistic environments. Traditional diarization efforts have focused on single audio streams based on telephone communications, broadcast news, and/or scripted speeches or lectures. Limited effort has focused on extended naturalistic data. Here, algorithm advancements are established for an extensive daily audio corpus called Prof-Life-Log, consisting of + 80days of 8-16 hr recordings from an individual’s daily life. Advancements include the formulation of (i) an improved threshold-optimized multiple feature speech activity detector (TO-Combo-SAD), (ii) advanced primary vs. secondary speaker detection, (iii) advanced word-count system using part-of-speech tagging and bag-of-words construction, (iv) environmental “sniffing” advancements to identify location based on properties of the acoustic space, and (v) diarization interaction analysis which highlights the amount of speech each individual produces along with recipient direction. The diarization advancements are evaluated on CRSS-UTDallas Prof-Life-Log. Results show improvements in speech activity detection, word count estimation, and environmental sniffing for naturalistic audio streams. The advancements suggest improved speech/language processing algorithms can address the increased diversity in daily audio for speaker knowledge detection/estimation and summarization.
Skip Nav Destination
Article navigation
October 2016
Meeting abstract. No PDF available.
October 01 2016
Prof-Life-Log: Monitoring and assessment of human speech and acoustics using daily naturalistic audio streams
John H. L. Hansen;
John H. L. Hansen
Jonsson School of Eng. & Comput. Sci., CRSS: Ctr. for Robust Speech Systems; UTDallas, 800 W Campbell Rd., The Univ. of Texas at Dallas, Richardson, TX 75080-3021, [email protected]
Search for other works by this author on:
Abhijeet Sangwan;
Abhijeet Sangwan
Jonsson School of Eng. & Comput. Sci., CRSS: Ctr. for Robust Speech Systems; UTDallas, 800 W Campbell Rd., The Univ. of Texas at Dallas, Richardson, TX 75080-3021, [email protected]
Search for other works by this author on:
Ali Ziaei;
Ali Ziaei
Jonsson School of Eng. & Comput. Sci., CRSS: Ctr. for Robust Speech Systems; UTDallas, 800 W Campbell Rd., The Univ. of Texas at Dallas, Richardson, TX 75080-3021, [email protected]
Search for other works by this author on:
Harishchandra Dubey;
Harishchandra Dubey
Jonsson School of Eng. & Comput. Sci., CRSS: Ctr. for Robust Speech Systems; UTDallas, 800 W Campbell Rd., The Univ. of Texas at Dallas, Richardson, TX 75080-3021, [email protected]
Search for other works by this author on:
Lakshmish Kaushik;
Lakshmish Kaushik
Jonsson School of Eng. & Comput. Sci., CRSS: Ctr. for Robust Speech Systems; UTDallas, 800 W Campbell Rd., The Univ. of Texas at Dallas, Richardson, TX 75080-3021, [email protected]
Search for other works by this author on:
Chengzhu Yu
Chengzhu Yu
Jonsson School of Eng. & Comput. Sci., CRSS: Ctr. for Robust Speech Systems; UTDallas, 800 W Campbell Rd., The Univ. of Texas at Dallas, Richardson, TX 75080-3021, [email protected]
Search for other works by this author on:
J. Acoust. Soc. Am. 140, 3010 (2016)
Citation
John H. L. Hansen, Abhijeet Sangwan, Ali Ziaei, Harishchandra Dubey, Lakshmish Kaushik, Chengzhu Yu; Prof-Life-Log: Monitoring and assessment of human speech and acoustics using daily naturalistic audio streams. J. Acoust. Soc. Am. 1 October 2016; 140 (4_Supplement): 3010. https://doi.org/10.1121/1.4969337
Download citation file:
Citing articles via
All we know about anechoic chambers
Michael Vorländer
Day-to-day loudness assessments of indoor soundscapes: Exploring the impact of loudness indicators, person, and situation
Siegbert Versümer, Jochen Steffens, et al.
A survey of sound source localization with deep learning methods
Pierre-Amaury Grumiaux, Srđan Kitić, et al.
Related Content
Fearless steps: Taking the next step towards advanced speech technology for naturalistic audio
J. Acoust. Soc. Am. (October 2019)
Fearless steps: Advancing speech and language processing for naturalistic audio streams from Earth to the Moon with Apollo
J Acoust Soc Am (March 2018)
Fearless steps Apollo: Advancements in robust speech technologies and naturalistic corpus development from Earth to the Moon
J Acoust Soc Am (October 2022)
UTDallas-PLTL: Advancing multi-stream speech processing for interaction assessment in peer-led team learning
J Acoust Soc Am (March 2018)