In this paper we present a model called the Modified Phase-Opponency (MPO) model for single-channel speech enhancement when the speech is corrupted by additive noise. The MPO model is based on the auditory PO model, proposed for detection of tones in noise. The PO model includes a physiologically realistic mechanism for processing the information in neural discharge times and exploits the frequency-dependent phase properties of the tuned filters in the auditory periphery by using a cross-auditory-nerve-fiber coincidence detection for extracting temporal cues. The MPO model alters the components of the PO model such that the basic functionality of the PO model is maintained but the properties of the model can be analyzed and modified independently. The MPO-based speech enhancement scheme does not need to estimate the noise characteristics nor does it assume that the noise satisfies any statistical model. The MPO technique leads to the lowest value of the LPC-based objective measures and the highest value of the perceptual evaluation of speech quality measure compared to other methods when the speech signals are corrupted by fluctuating noise. Combining the MPO speech enhancement technique with our aperiodicity, periodicity, and pitch detector further improves its performance.

1.
Anzalone
,
M. C.
(
2006
). “
Time-frequency gain manipulation for noise-reduction in hearing aids: Ideal and phase-opponency detectors
,” Ph.D. thesis, Syracuse University.
2.
Beh
,
J.
, and
Ko
,
H.
(
2003
). “
A novel spectral subtraction scheme for robust speech recognition: Spectral subtraction using spectral harmonics of speech
,” in
Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing
, Hongkong, pp.
648
651
.
3.
Benesty
,
J.
,
Makino
,
S.
, and
Chen
,
J.
(
2005
).
Speech Enhancement
(
Springer
,
The Netherlands
).
4.
Berouti
,
M.
,
Schwartz
,
R.
, and
Makhoul
,
J.
(
1979
). “
Enhancement of speech corrupted by additive noise
,” in
Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing
, Washington, DC, pp.
208
211
.
5.
Boll
,
S. F.
(
1979
). “
Suppression of acoustic noise in speech using spectral subtraction
,”
IEEE Trans. Acoust., Speech, Signal Process.
ASSP-27
,
113
120
.
6.
Cappe
,
O.
(
1994
). “
Elimination of the musical noise phenomenon with the Ephraim and Malah noise suppressor
,”
IEEE Trans. Speech Audio Process.
2
,
345
349
.
7.
Carney
,
L.
,
Heinz
,
M. G.
,
Evilsizer
,
M. E.
,
Gilkey
,
R. H.
, and
Colburn
,
H. S.
(
2002
). “
Auditory phase opponency: A temporal model for masked detection at low frequencies
,”
Acta Acust.
88
,
334
347
.
8.
Cheng
,
Y. M.
, and
O’Shaughnessy
,
D.
(
1991
). “
Speech enhancement based conceptually on auditory evidence
,”
IEEE Trans. Signal Process.
39
,
1943
1954
.
9.
Cohen
,
I.
(
2004
). “
Speech enhancement using a noncausal a-priori SNR estimator
,”
IEEE Signal Process. Lett.
11
,
725
728
.
10.
Compernolle
,
D. V.
(
1992
). “
DSP techniques for speech enhancement
,”
ESCA tutorial and research workshop on speech processing in adverse conditions
, Cannes, France, pp.
21
30
.
11.
Deshmukh
,
O.
, and
Espy-Wilson
,
C.
(
2005
). “
Speech enhancement using auditory phase opponency model
,” in
Proceedings of the Eurospeech
, pp.
2117
2120
, Lisbon, Portugal.
12.
Deshmukh
,
O.
,
Espy-Wilson
,
C.
,
Azalone
,
M.
, and
Carney
,
L.
(
2005a
). “
A noise reduction strategy for speech based on phase-opponency detectors
,” in
149th Meeting of the Acoustical Society of America
, Vancouver, Canada.
13.
Deshmukh
,
O. D.
(
2006
). “
Synergy of acoustic phonetics and auditory modeling towards robust speech recognition
,” Ph.D. thesis,
University of Maryland
, College Park, MD.
14.
Deshmukh
,
O. D.
, and
Espy-Wilson
,
C. Y.
(
2006
). “
Modified phase opponency based solution to the speech separation challenge
,” in
International Conference on Spoken Language Processing
, Pittsburgh, PA, pp.
101
104
.
15.
Deshmukh
,
O. D.
,
Espy-Wilson
,
C. Y.
,
Salomon
,
A.
, and
Singh
,
J.
(
2005b
). “
Use of temporal information: Detection of periodicity, aperiodicity, and pitch in speech
,”
IEEE Trans. Speech Audio Process.
13
,
776
786
.
16.
Ephraim
,
Y.
, and
Malah
,
D.
(
1984
). “
Speech enhancement using a minimum mean-square error short-time spectral amplitude estimator
,”
IEEE Trans. Acoust., Speech, Signal Process.
32
,
1109
1121
.
17.
Ephraim
,
Y.
, and
Malah
,
D.
(
1985
). “
Speech enhancement using a minimum mean-square log-spectral amplitude estimator
,”
IEEE Trans. Acoust., Speech, Signal Process.
33
,
443
445
.
18.
Gustafsson
,
H.
,
Nordholm
,
S. E.
, and
Claesson
,
I.
(
2001
). “
Spectral subtraction using reduced delay convolution and adaptive averaging
,”
IEEE Trans. Speech Audio Process.
9
,
799
807
.
19.
Hansen
,
J.
, and
Pellom
,
B.
(
1998
). “
An effective quality evaluation protocol for speech enhancements algorithms
,” in
Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing
, pp.
2819
2822
.
20.
Hansen
,
J. H.
, and
Nandkumar
,
S.
(
1995
). “
Robust estimation of speech in noisy backgrounds based on aspects of the auditory process
,”
J. Acoust. Soc. Am.
97
,
3833
3849
.
21.
Hirsch
,
H. G.
, and
Pearce
,
D.
(
2000
). “
The AURORA experimental framework for the performance evaluations of speech recognition systems under noisy conditions
,” in
ISCA ITRW ASR2000 Automatic Speech Recognition: Challenges for the Next Millennium
, Paris, France, pp.
18
20
.
22.
Hohmann
,
V.
(
2002
). “
Frequency analysis and synthesis using a gammatone filterbank
,”
Acta Acust.
88
,
334
347
.
23.
Hu
,
G.
, and
Wang
,
D. L.
(
2004
). “
Monaural speech separation based on pitch tracking and amplitude modulation
,”
IEEE Trans. Neural Netw.
15
,
1135
1150
.
24.
Loizou
,
P. C.
(
2005
). “
Speech enhancement based on perceptually motivated Bayesian estimators of the magnitude spectrum
,”
IEEE Trans. Speech Audio Process.
13
,
857
869
.
25.
McAulay
,
R. J.
, and
Malpass
,
M. L.
(
1980
). “
Speech enhancement using a soft-decision noise suppression filter
,”
IEEE Trans. Acoust., Speech, Signal Process.
ASSP-28
,
137
145
.
26.
Mesgarani
,
N.
, and
Shamma
,
S. A.
(
2005
). “
Speech enhancement based on filtering the spectrotemporal modulations
,” in
Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing
, Philadelphia, PA, pp.
1105
110
8.
27.
Rix
,
A. W.
,
Beerends
,
J. G.
,
Hollier
,
M. P.
, and
Hekstra
,
A. P.
(
2001
). “
Perceptual evaluation of Speech Quality (PESQ), an objective method for end-to-end speech quality assessment of narrowband telephone networks and speech codecs
,” Technical Rep., ITU-T recommendation, P.
862
.
28.
Tsoukalas
,
D. E.
,
Mourjopoulos
,
J. N.
, and
Kokkinakis
,
G.
(
1997
). “
Speech enhancement based on audible noise suppression
,”
IEEE Trans. Speech Audio Process.
5
,
497
514
.
29.
Virag
,
N.
(
1999
). “
Single channel speech enhancement based on masking properties of the human auditory system
,”
IEEE Trans. Speech Audio Process.
7
,
126
137
.
30.
Wang
,
D. L.
(
2005
).
Speech Separation by Humans and Machines
, Chap. 12 (
Kluwer Academic
,
Norwell, MA
).
You do not currently have access to this content.