The automatic analysis of conversational audio remains difficult, in part, due to the presence of multiple talkers speaking in turns, often with significant intonation variations and overlapping speech. The majority of prior work on psychoacoustic speech analysis and system design has focused on single-talker speech or multi-talker speech with overlapping talkers (for example, the cocktail party effect). There has been much less focus on how listeners detect a change in talker or in probing the acoustic features significant in characterizing a talker's voice in conversational speech. This study examines human talker change detection (TCD) in multi-party speech utterances using a behavioral paradigm in which listeners indicate the moment of perceived talker change. Human reaction times in this task can be well-estimated by a model of the acoustic feature distance among speech segments before and after a change in talker, with estimation improving for models incorporating longer durations of speech prior to a talker change. Further, human performance is superior to several online and offline state-of-the-art machine TCD systems.
Skip Nav Destination
Article navigation
January 2019
January 07 2019
Talker change detection: A comparison of human and machine performance
Neeraj Kumar Sharma;
Neeraj Kumar Sharma
a)
1
Department of Psychology, Carnegie Mellon University
, 5000 Forbes Avenue, Pittsburgh, Pennsylvania 15213, USA
Search for other works by this author on:
Shobhana Ganesh;
Shobhana Ganesh
2
Department of Electrical Engineering, CV Raman Road, Indian Institute of Science
, Bangalore 560012, India
Search for other works by this author on:
Sriram Ganapathy;
Sriram Ganapathy
2
Department of Electrical Engineering, CV Raman Road, Indian Institute of Science
, Bangalore 560012, India
Search for other works by this author on:
Lori L. Holt
Lori L. Holt
1
Department of Psychology, Carnegie Mellon University
, 5000 Forbes Avenue, Pittsburgh, Pennsylvania 15213, USA
Search for other works by this author on:
a)
Electronic mail: [email protected]
J. Acoust. Soc. Am. 145, 131–142 (2019)
Article history
Received:
August 15 2018
Accepted:
December 01 2018
Citation
Neeraj Kumar Sharma, Shobhana Ganesh, Sriram Ganapathy, Lori L. Holt; Talker change detection: A comparison of human and machine performance. J. Acoust. Soc. Am. 1 January 2019; 145 (1): 131–142. https://doi.org/10.1121/1.5084044
Download citation file:
Pay-Per-View Access
$40.00
Sign In
You could not be signed in. Please check your credentials and make sure you have an active account and try again.
Citing articles via
All we know about anechoic chambers
Michael Vorländer
Day-to-day loudness assessments of indoor soundscapes: Exploring the impact of loudness indicators, person, and situation
Siegbert Versümer, Jochen Steffens, et al.
A survey of sound source localization with deep learning methods
Pierre-Amaury Grumiaux, Srđan Kitić, et al.
Related Content
English vowel recognition in multi-talker babbles mixed with different numbers of talkers
JASA Express Lett. (April 2024)
The effect of exposure to a single vowel on talker normalization for vowels
J. Acoust. Soc. Am. (March 2015)
Talker-specific influences on phonetic category structure
J. Acoust. Soc. Am. (August 2015)
Effects of noise and talker intelligibility on judgments of accentedness
J. Acoust. Soc. Am. (May 2018)
Talker information influences spectral contrast effects in speech categorization
J. Acoust. Soc. Am. (November 2015)