This study proposes an approach to improve the perceptual quality of speech separated by binary masking through the use of reconstruction in the time-frequency domain. Non-negative matrix factorization and sparse reconstruction approaches are investigated, both using a linear combination of basis vectors to represent a signal. In this approach, the short-time Fourier transform (STFT) of separated speech is represented as a linear combination of STFTs from a clean speech dictionary. Binary masking for separation is performed using deep neural networks or Bayesian classifiers. The perceptual evaluation of speech quality, which is a standard objective speech quality measure, is used to evaluate the performance of the proposed approach. The results show that the proposed techniques improve the perceptual quality of binary masked speech, and outperform traditional time-frequency reconstruction approaches.
Skip Nav Destination
Article navigation
August 2014
August 01 2014
Reconstruction techniques for improving the perceptual quality of binary masked speech Available to Purchase
Donald S. Williamson;
Donald S. Williamson
a)
Department of Computer Science and Engineering,
The Ohio State University
, Columbus, Ohio 43210
Search for other works by this author on:
Yuxuan Wang;
Yuxuan Wang
Department of Computer Science and Engineering,
The Ohio State University
, Columbus, Ohio 43210
Search for other works by this author on:
DeLiang Wang
DeLiang Wang
Department of Computer Science and Engineering and the Center for Cognitive and Brain Sciences,
The Ohio State University
, Columbus, Ohio 43210
Search for other works by this author on:
Donald S. Williamson
a)
Department of Computer Science and Engineering,
The Ohio State University
, Columbus, Ohio 43210
Yuxuan Wang
Department of Computer Science and Engineering,
The Ohio State University
, Columbus, Ohio 43210
DeLiang Wang
Department of Computer Science and Engineering and the Center for Cognitive and Brain Sciences,
The Ohio State University
, Columbus, Ohio 43210a)
Author to whom correspondence should be addressed. Electronic mail: [email protected]
J. Acoust. Soc. Am. 136, 892–902 (2014)
Article history
Received:
August 20 2013
Accepted:
June 05 2014
Citation
Donald S. Williamson, Yuxuan Wang, DeLiang Wang; Reconstruction techniques for improving the perceptual quality of binary masked speech. J. Acoust. Soc. Am. 1 August 2014; 136 (2): 892–902. https://doi.org/10.1121/1.4884759
Download citation file:
Pay-Per-View Access
$40.00
Sign In
You could not be signed in. Please check your credentials and make sure you have an active account and try again.
Citing articles via
Climatic and economic fluctuations revealed by decadal ocean soundscapes
Vanessa M. ZoBell, Natalie Posdaljian, et al.
Variation in global and intonational pitch settings among black and white speakers of Southern American English
Aini Li, Ruaridh Purse, et al.
Bioinspired flow-sensing capacitive microphone
Johar Pourghader, Weili Cui, et al.
Related Content
Deep neural architectures for dialect classification with single frequency filtering and zero-time windowing feature representations
J. Acoust. Soc. Am. (February 2022)
Markov random field in speech enhancement: Application for tonal languages
Proc. Mtgs. Acoust. (May 2013)
Cross-spectral methods for processing speech
J. Acoust. Soc. Am. (November 2001)
Markov random field in speech enhancement: Application for tonal languages
J. Acoust. Soc. Am. (May 2013)
Direction-of-arrival estimation of passive acoustic sources in reverberant environments based on the Householder transformation
J. Acoust. Soc. Am. (November 2015)