Speaker separation is a special case of speech separation, in which the mixture signal comprises two or more speakers. Many talker-independent speaker separation methods have been introduced in recent years to address this problem in anechoic conditions. To consider more realistic environments, this paper investigates talker-independent speaker separation in reverberant conditions. To effectively deal with speaker separation and speech dereverberation, extending the deep computational auditory scene analysis (CASA) approach to a two-stage system is proposed. In this method, reverberant utterances are first separated and separated utterances are then dereverberated. The proposed two-stage deep CASA system significantly outperforms a baseline one-stage deep CASA method in real reverberant conditions. The proposed system has superior separation performance at the frame level and higher accuracy in assigning separated frames to individual speakers. The proposed system successfully generalizes to an unseen speech corpus and exhibits similar performance to a talker-dependent system.
Skip Nav Destination
,
,
Article navigation
September 2020
September 03 2020
A two-stage deep learning algorithm for talker-independent speaker separation in reverberant conditions Available to Purchase
Masood Delfarah
;
Masood Delfarah
a)
Department of Computer Science and Engineering, The Ohio State University
, Columbus, Ohio 43210, USA
Search for other works by this author on:
Yuzhou Liu
;
Yuzhou Liu
b)
Department of Computer Science and Engineering, The Ohio State University
, Columbus, Ohio 43210, USA
Search for other works by this author on:
DeLiang Wang
DeLiang Wang
c)
Department of Computer Science and Engineering, The Ohio State University
, Columbus, Ohio 43210, USA
Search for other works by this author on:
Masood Delfarah
a)
Yuzhou Liu
b)
DeLiang Wang
c)
Department of Computer Science and Engineering, The Ohio State University
, Columbus, Ohio 43210, USA
a)
Electronic mail: [email protected], ORCID: 0000-0002-8354-0832.
b)
ORCID: 0000-0002-7030-9121.
c)
Also at: Center for Cognitive and Brain Sciences, The Ohio State University, Columbus, OH 43210, USA, ORCID: 0000-0001-8195-6319.
J. Acoust. Soc. Am. 148, 1157–1168 (2020)
Article history
Received:
February 27 2020
Accepted:
August 04 2020
Citation
Masood Delfarah, Yuzhou Liu, DeLiang Wang; A two-stage deep learning algorithm for talker-independent speaker separation in reverberant conditions. J. Acoust. Soc. Am. 1 September 2020; 148 (3): 1157–1168. https://doi.org/10.1121/10.0001779
Download citation file:
Pay-Per-View Access
$40.00
Sign In
You could not be signed in. Please check your credentials and make sure you have an active account and try again.
Citing articles via
Focality of sound source placement by higher (ninth) order ambisonics and perceptual effects of spectral reproduction errors
Nima Zargarnezhad, Bruno Mesquita, et al.
A survey of sound source localization with deep learning methods
Pierre-Amaury Grumiaux, Srđan Kitić, et al.
Variation in global and intonational pitch settings among black and white speakers of Southern American English
Aini Li, Ruaridh Purse, et al.
Related Content
A talker-independent deep learning algorithm to increase intelligibility for hearing-impaired listeners in reverberant competing talker conditions
J. Acoust. Soc. Am. (June 2020)
A causal and talker-independent speaker separation/dereverberation deep learning algorithm: Cost associated with conversion to real-time capable operation
J. Acoust. Soc. Am. (November 2021)
Deep learning based speaker separation and dereverberation can generalize across different languages to improve intelligibility
J. Acoust. Soc. Am. (October 2021)
Power spectral density based speaker recognition for CASA based systems
AIP Conf. Proc. (June 2023)
On training targets for deep learning approaches to clean speech magnitude spectrum estimation
J. Acoust. Soc. Am. (May 2021)