Currently, there are technology barriers inhibiting speech processing systems that work in extremely noisy conditions from meeting the demands of modern applications. These systems often require a noise reduction system working in combination with a precise voice activity detector (VAD). This paper shows statistical likelihood ratio tests formulated in terms of the integrated bispectrum of the noisy signal. The integrated bispectrum is defined as a cross spectrum between the signal and its square, and therefore a function of a single frequency variable. It inherits the ability of higher order statistics to detect signals in noise with many other additional advantages: (i) Its computation as a cross spectrum leads to significant computational savings, and (ii) the variance of the estimator is of the same order as that of the power spectrum estimator. The proposed approach incorporates contextual information to the decision rule, a strategy that has reported significant benefits for robust speech recognition applications. The proposed VAD is compared to the G.729, adaptive multirate, and advanced front-end standards as well as recently reported algorithms showing a sustained advantage in speech/nonspeech detection accuracy and speech recognition performance.

1.
R. L.
Bouquin-Jeannes
and
G.
Faucon
, “
Study of a voice activity detector and its influence on a noise reduction system
,”
Speech Commun.
16
,
245
254
(
1995
).
2.
L.
Karray
and
A.
Martin
, “
Towards improving speech detection robustness for speech recognition in adverse environments
,”
Speech Commun.
261
276
(
2003
).
3.
ETSI
, “
Voice activity detector (VAD) for Adaptive Multi-Rate (AMR) speech traffic channels
,” ETSI EN 301 708 Recommendation,
1999
(European Telecommunications Standards Inst., France).
4.
ITU
, “
A silence compression scheme for G.729 optimized for terminals conforming to recommendation V.70
,” ITU-T Recommendation G.729-Annex B,
1996
(International Telecomm. Union, Geneva).
5.
A.
Sangwan
,
M. C.
Chiranth
,
H. S.
Jamadagni
,
R.
Sah
,
R. V.
Prasad
, and
V.
Gaurav
, “
VAD techniques for real-time speech transmission on the Internet
,” in
IEEE International Conference on High-Speed Networks and Multimedia Communications
,
2002
, pp.
46
50
.
6.
M.
Marzinzik
and
B.
Kollmeier
, “
Speech pause detection for noise spectrum estimation by tracking power envelope dynamics
,”
IEEE Trans. Speech Audio Process.
10
,
341
351
(
2002
).
7.
D. K.
Freeman
,
G.
Cosier
,
C. B.
Southcott
, and
I.
Boyd
, “
The voice activity detector for the pan-european digital cellular mobile telephone service
,” in
Proceedings of the International Conference on Acoustics, Speech and Signal Processing
,
1989
, pp.
369
372
.
8.
J.
Sohn
,
N. S.
Kim
, and
W.
Sung
, “
A statistical model-based voice activity detection
,”
IEEE Signal Process. Lett.
16
,
1
3
(
1999
).
9.
I.
Potamitis
and
E.
Fishler
, “
Speech activity detection and enhancement of a moving speaker based on the wideband generalized likelihood ratio and microphone arrays
,”
J. Acoust. Soc. Am.
116
,
2406
2415
(
2004
).
10.
J.
Górriz
,
J.
Ramírez
,
J. C.
Segura
, and
C.
Puntonet
, “
An effective cluster-based model for robust speech detection and speech recognition in noisy environments
,”
J. Acoust. Soc. Am.
120
,
470
481
(
2006
).
11.
M.
Berouti
,
R.
Schwartz
, and
J.
Makhoul
, “
Enhancement of speech corrupted by acoustic noise
,” in
Proceedings of the International Conference on Acoustics, Speech and Signal Processing
,
1979
, pp.
208
211
.
12.
S. F.
Boll
, “
Suppression of acoustic noise in speech using spectral subtraction
,”
IEEE Trans. Acoust., Speech, Signal Process.
27
,
113
120
(
1979
).
13.
ETSI
, “
Speech processing, transmission and quality aspects (STQ); Distributed speech recognition; advanced front-end feature extraction algorithm; Compression algorithms
,” ETSI ES 202 050 Recommendation,
2002
(European Telecommunications Standards Inst., France).
14.
K.
Woo
,
T.
Yang
,
K.
Park
, and
C.
Lee
, “
Robust voice activity detection algorithm for estimating noise spectrum
,”
Electron. Lett.
36
,
180
181
(
2000
).
15.
Q.
Li
,
J.
Zheng
,
A.
Tsai
, and
Q.
Zhou
, “
Robust endpoint detection and energy normalization for real-time speech and speaker recognition
,”
IEEE Trans. Speech Audio Process.
10
,
146
157
(
2002
).
16.
J. K.
Tugnait
, “
Detection of non-Gaussian signals using integrated polyspectrum
,”
IEEE Trans. Signal Process.
42
,
3137
3149
(
1994
).
17.
J. K.
Tugnait
, “
Corrections to detection of non-Gaussian signals using integrated polyspectrum
,”
IEEE Trans. Signal Process.
43
,
2792
2793
(
1995
).
18.
J.
Ramírez
,
J. C.
Segura
,
M. C.
Benítez
,
A.
de la Torre
, and
A.
Rubio
, “
A new adaptive long-term spectral estimation voice activity detector
,” in
Proceedings of EUROSPEECH 2003
,
Geneva, Switzerland
, pp.
3041
3044
.
19.
A.
Sangwan
,
W.
Zhu
, and
M.
Ahmad
, “
Improved voice activity detection via contextual information and noise suppression
,” in
IEEE International Symposium on Circuits and Systems (ISCAS)
,
2005
, pp.
868
871
.
20.
J.
Ramírez
,
J. C.
Segura
,
C.
Benítez
,
A.
de la Torre
, and
A.
Rubio
, “
An effective subband osf-based vad with noise reduction for robust speech recognition
,”
IEEE Trans. Speech Audio Process.
13
,
1119
1129
(
2005
).
21.
J.
Ramírez
,
J. C.
Segura
,
M. C.
Benítez
,
A.
de la Torre
, and
A.
Rubio
, “
Efficient voice activity detection algorithms using long-term speech information
,”
Speech Commun.
42
,
271
287
(
2004
).
22.
J.
Ramírez
,
J. C.
Segura
,
C.
Benítez
,
L.
García
, and
A.
Rubio
, “
Statistical voice activity detection using a multiple observation likelihood ratio test
,”
IEEE Signal Process. Lett.
12
,
689
692
(
2005
).
23.
J.
Górriz
,
J.
Ramírez
,
J.
Segura
, and
C.
Puntonet
, “
Improved MO-LRT VAD based on bispectra Gaussian model
,”
Electron. Lett.
41
,
877
879
(
2005
).
24.
J.
Ramírez
,
J. M.
Górriz
,
J. C.
Segura
,
C. G.
Puntonet
, and
A.
Rubio
, “
Speech/non-speech discrimination based on contextual information integrated bispectrum LRT
,”
IEEE Signal Process. Lett.
13
(
2006
).
25.
D. R.
Brillinger
and
M.
Rosenblatt
,
Spectral Analysis of Time Series
(
Wiley
,
New York
,
1968
).
26.
C.
Nikias
and
M.
Raghuveer
, “
Bispectrum estimation: A digital signal processing framework
,”
Proc. IEEE
75
,
869
891
(
1987
).
27.
A.
Moreno
,
L.
Borge
,
D.
Christoph
,
R.
Gael
,
C.
Khalid
,
E.
Stephan
, and
A.
Jeffrey
, “
SpeechDat-Car: A large speech database for automotive environments
,” in
Proceedings of the II LREC Conference
2000
.
28.
J. M.
Górriz
,
J.
Ramírez
,
C. G.
Puntonet
, and
J.
Segura
, “
An efficient bispectrum phase entropy-based algorithm for VAD
,” in Interspeech
2006
, pp.
2322
2325
.
29.
X.
Zhang
,
Y.
Shi
, and
Z.
Bao
, “
A new feature vector using selected bispectra for signal classification with application in radar target recognition
,”
IEEE Trans. Signal Process.
49
,
1875
1885
(
2001
).
30.
X.
Liao
and
Z.
Bao
, “
Circularly integrated bispectra: Novel shift invariant features for high-resolution radar target recognition
,”
Electron. Lett.
34
,
1879
1880
(
1998
).
31.
D.
Brillinger
,
Time Series Data Analysis and Theory
(
Holt, Rinehart and Winston
, New York,
1975
).
32.
A.
Benyassine
,
E.
Shlomot
,
H.
Su
,
D.
Massaloux
,
C.
Lamblin
, and
J.
Petit
, “
ITU-T Recommendation G.729 Annex B: A silence compression scheme for use with G.729 optimized for V.70 digital simultaneous voice and data applications
,”
IEEE Commun. Mag.
35
,
64
73
(
1997
).
33.
ETSI
, “
Speech processing, transmission and quality aspects (stq); distributed speech recognition; front-end feature extraction algorithm; compression algorithms
,” ETSI ES 201 108 Recommendation,
2000
(European Telecommunications Standards Inst., France).
34.
S.
Young
,
J.
Odell
,
D.
Ollason
,
V.
Valtchev
, and
P.
Woodland
,
The HTK Book
(
Cambridge University Press
,
New York
,
1997
).
35.
H.
Hirsch
and
D.
Pearce
, “
The AURORA experimental framework for the performance evaluation of speech recognition systems under noise conditions
,” in
ISCA ITRW ASR2000 Automatic Speech Recognition: Challenges for the Next Millennium
, Paris, France,
2000
(Intl. Speech Communication Assn.).
36.
C.
Nikias
and
A.
Petropulu
,
Higher Order Spectra Analysis: a Non-linear Signal Processing Framework
(
Prentice Hall
,
Englewood Cliffs, NJ
,
1993
).
You do not currently have access to this content.