A method is presented in which conventional speech algorithms are applied, with no modifications, to improve their performance in extremely noisy environments. It has been demonstrated that, for eigen-channel algorithms, pre-training multiple speaker identification (SID) models at a lattice of signal-to-noise-ratio (SNR) levels and then performing SID using the appropriate SNR dependent model was successful in mitigating noise at all SNR levels. In those tests, it was found that SID performance was optimized when the SNR of the testing and training data were close or identical. In this current effort multiple i-vector algorithms were used, greatly improving both processing throughput and equal error rate classification accuracy. Using identical approaches in the same noisy environment, performance of SID, language identification, gender identification, and diarization were significantly improved. A critical factor in this improvement is speech activity detection (SAD) that performs reliably in extremely noisy environments, where the speech itself is barely audible. To optimize SAD operation at all SNR levels, two algorithms were employed. The first maximized detection probability at low levels (−10 dB ≤ SNR < +10 dB) using just the voiced speech envelope, and the second exploited features extracted from the original speech to improve overall accuracy at higher quality levels (SNR ≥ +10 dB).

1.
P.
Jancovic
and
M.
Köküer
, “
Employment of voicing information of speech spectra for noise-robust speaker identification
,” in
15th European Signal Processing Conference (EUSIPCO)
(
2007
), pp.
2399
2403
.
2.
K.
Kumar
,
Q.
Wu
,
Y.
Wang
, and
M.
Savvides
, “
Noise robust speaker identification using Bhattacharyya distance in adapted Gaussian models space
,” in
16th European Signal Processing Conference (EUSIPCO)
(
2008
), pp.
1
4
.
3.
K.
Matsumoto
,
N.
Hayasaka
, and
Y.
Iiguni
, “
Noise robust speaker identification by dividing MFCC
,” in
6th International Symposium on Communication, Control and Signal Processing (ISCCSP)
(
2014
), pp.
652
655
.
4.
C.
Tzagkarakis
and
A.
Mouchtaris
, “
Reconstruction of missing features based on a low-rank assumption for robust speaker identification
,” in
IISA, The 5th International Conference on Information, Intelligence, Systems and Applications
(
2014
), pp.
432
437
.
5.
Z.
Tan
and
M.
Mak
, “
Bottleneck features from SNR-adaptive denoising deep classifier for speaker identification
,” in
Proceedings of APSIPA Annual Summit and Conference
(December,
2015
), pp.
1035
1040
.
6.
A. K.
Dutta
and
K. S.
Rao
, “
Robust language identification using power normalized cepstral coefficients
,” in
Eighth International Conference on Contemporary Computing (IC3)
(
2015
), pp.
253
256
.
7.
S.
Ganapathy
,
M.
Omar
, and
J.
Pelecanos
, “
Noisy channel adaptation in language identification
,” in
IEEE Spoken Language Technology Workshop
(
2012
), pp.
307
312
.
8.
M. K.
Rai
,
N.
Fahad
,
M. S.
Fahad
,
J.
Yadav
, and
K. S.
Rao
, “
Language identification using PLDA based on I-vector in noisy environment
,” in
International Conference on Advances in Computing, Communications and Informatics (ICACCI)
, Jaipur, India (September
2016
), pp.
21
24
.
9.
S.
Ranjan
,
G.
Liu
, and
J. H. L.
Hansen
, “
An I-vector PLDA based gender identification approach for severely distorted and multilingual DARPA RATS data
,” in
ASRU
(
2015
), pp.
331
337
.
10.
S. M.
Mirrezaie
,
S. M.
Ahadi
, and
A.
Kashi
, “
Robust speaker diarization in a multi-speaker environment using autocorrelation-based noise subtraction
,” in
IEEE International Symposium on Signal Processing and Information Technology
(
2007
), pp.
291
296
.
11.
Q.
Li
,
Q.
Fan
,
Y.
Xiao
, and
W.
Ye
, “
A comparable study on PNCC in speaker diarization for meetings
,” in
First ACIS International Symposium on Cryptography and Network Security, Data Mining and Knowledge Discovery, E-Commerce & Its Applications and Embedded Systems (CDEE)
(
2010
), pp.
157
160
.
12.
A. L.
Bartos
and
D. J.
Nelson
, “
Enabling improved speaker recognition by voice quality estimation
,” in
IEEE 45th Asilomar Conference on Signals, Systems and Computers
(
2011
), pp.
595
599
.
13.
D. A.
Reynolds
, “
A Gaussian mixture modeling approach to text-independent speaker identification
,” Ph.D. thesis, Georgia Institute of Technology, Atlanta, GA (September,
1992
).
14.
D. A.
Reynolds
and
R. C.
Rose
, “
Robust text-independent speaker identification using Gaussian mixture speaker models
,” in
IEEE Transactions on Speech and Audio Processes
(January,
1995
), Vol.
3
, pp.
72
83
.
16.
P.
Kenny
,
G.
Boulianne
,
P.
Ouellet
, and
P.
Dumouchel
, “
Joint factor analysis versus eigen-channels in speaker recognition
,” in
IEEE Transactions on Audio, Speech, and Language Processes
(May
2007
), Vol.
15
, pp.
1435
1447
.
17.
N.
Dehak
,
P. J.
Kenny
,
R.
Dehak
,
P.
Dumouchel
, and
P.
Oellet
, “
Front-end factor analysis for speaker verification
,” in
IEEE Transactions on Audio, Speech, and Language Processes
(May
2011
), Vol.
19
, pp.
788
798
.
18.
O.
Glembek
,
L.
Burget
,
P.
Matějka
,
M.
Karafiát
, and
P.
Kenny
, “
Simplification and optimization of I-vector extraction
,” in
Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP
(
2011
), pp.
4516
4519
.
19.
P.
Matějka
,
O.
Glembek
,
F.
Castaldo
,
J.
Alam
,
O.
Plchot
,
P.
Kenny
,
L.
Burget
, and
J.
Černocký
, “
Fullcovariance UBM and heavy-tailed PLDA in I-vector speaker verification
,” in
Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP
, Prague (
2011
), pp.
4828
4831
.
20.
J.-H.
Bach
,
B.
Kollmeier
, and
J.
Anemuller
, “
Modulation-based detection of speech in real background noise: Generalization to novel background classes
,” in
Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP
(
2010
), pp.
41
44
.
21.
D. C.
Smith
,
J.
Townsend
,
D. J.
Nelson
, and
D.
Richman
, “
A multivariate speech activity detector based on syllable rate
,” in
Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP
(
1999
), pp.
73
76
.
22.
A. L.
Bartos
, “
Syllable rate voice activity detection (SRVAD) algorithm documentation
,” version 2, prepared for Naval Research Laboratory (NRL), October 22, 2012, Contract No. N00173-05-C-2049, available from NRL [email protected].
23.
J. D.
Prince
and
J. H.
Elder
, “
Probabilistic linear discriminant analysis for inferences about identity
,” in
IEEE 11th International Conference on Computer Vision
(
2007
), pp.
1
8
.
24.
25.
P. D.
Kenny
and
F.
Castaldo
, “
Diarization of telephone conversations using factor analysis
,”
IEEE J. Select. Top. Sign. Process.
4
(
6
),
1059
1070
(
2010
).
26.
P.
Kenny
, “
Bayesian analysis of speaker diarization with eigenvoice priors
,” technical report, CRIM, Montreal (May
2008
).
You do not currently have access to this content.