This paper reports the results of our experiments on speaker identification in the SCOTUS corpus, which includes oral arguments from the Supreme Court of the United States. Our main findings are as follows: 1) a combination of Gaussian mixture models and monophone HMM models attains near‐100% text‐independent identification accuracy on utterances that are longer than one second; (2) the sampling rate of 11025 Hz achieves the best performance (higher sampling rates are harmful); and a sampling rate as low as 2000 Hz still achieves more than 90% accuracy; (3) a distance score based on likelihood numbers was used to measure the variability of phones among speakers; we found that the most variable phone is the phone UH (as in good), and the velar nasal NG is more variable than the other two nasal sounds M and N; 4.) our models achieved “perfect” forced alignment on very long speech segments (one hour). These findings and their significance are discussed.
Skip Nav Destination
,
Article navigation
May 2008
Meeting abstract. No PDF available.
May 01 2008
Speaker identification on the SCOTUS corpus Free
Jiahong Yuan;
Jiahong Yuan
University of Pennsylvania, 609 Williams Hall, Philadelphia, PA 19104, USA, [email protected]
Search for other works by this author on:
Mark Liberman
Mark Liberman
University of Pennsylvania, 609 Williams Hall, Philadelphia, PA 19104, USA, [email protected]
Search for other works by this author on:
Jiahong Yuan
Mark Liberman
University of Pennsylvania, 609 Williams Hall, Philadelphia, PA 19104, USA, [email protected]
J. Acoust. Soc. Am. 123, 3878 (2008)
Citation
Jiahong Yuan, Mark Liberman; Speaker identification on the SCOTUS corpus. J. Acoust. Soc. Am. 1 May 2008; 123 (5_Supplement): 3878. https://doi.org/10.1121/1.2935783
Download citation file:
Citing articles via
Focality of sound source placement by higher (ninth) order ambisonics and perceptual effects of spectral reproduction errors
Nima Zargarnezhad, Bruno Mesquita, et al.
A survey of sound source localization with deep learning methods
Pierre-Amaury Grumiaux, Srđan Kitić, et al.
Drawer-like tunable ventilated sound barrier
Yong Ge, Yi-jun Guan, et al.
Related Content
Acoustic hole filling for sparse enrollment data using a cohort universal corpus for speaker recognition
J. Acoust. Soc. Am. (February 2012)
The speakers in the room corpus
J. Acoust. Soc. Am. (March 2018)
External factors impacting the performance of speaker identification: Multisession audio research project (MARP) corpus experiments
J. Acoust. Soc. Am. (May 2007)
A technique for adjusting Gaussian mixture model weights that improves speaker identification performance in the presence of phonemic train/test mismatch.
J. Acoust. Soc. Am. (April 2011)
A human vocal utterance corpus for perceptual and acoustic analysis of speech, singing, and intermediate vocalizations
J. Acoust. Soc. Am. (October 2002)