The speech recognition exercises in this paper are intended to identify the speeches having MFCC and GMM as a basis. As a practical biometric technique for phone applications, speech recognition offers a lot of promise and doesn’t require specialized or complex hardware. Signal processing is usually done in two stages: testing and training, to accomplish the objective of speech recognition. Speaker specific feature parameters are computed from the speech during the training phase. The characteristics are utilized to create statistical representations of various speakers. Speech samples from unidentified speakers are compared with the models and categorised throughout the testing phase. This study presents the development of an MFCC & GMM voice recognition algorithm. The well recognised Mel Frequency Cepstral coefficients, or {MFCCs}, have been employed as characteristics because of the documented changes in the crucial bandwidths of the human ear with frequency. We created a system model utilizing the GMM (Gaussian Mixture Model) in order to make the system realistic. With the help of the EM (Expectation Minimization) algorithm, GMM parameters are computed. MFCCs are computed throughout the testing and training stages. During a training session and a subsequent assessment session, speakers repeated distinct sentences. Speech can be falsely rejected or accepted up to a certain threshold. The location of this decision threshold is when the probabilities of the two errors are equal. The MATLAB environment was used to construct the codes.

1.
Prasad
,
V.
(
2015
).
Voice recognition system: speech-to-text
.
Journal of Applied and Fundamental Sciences
,
1
(
2
),
191
.
2.
Rashid
,
R. A.
,
Mahalin
,
N. H.
,
Sarijari
,
M. A.
, &
Aziz
,
A. A. A.
(
2008
, May). Security system using biometric technology: Design and implementation of Voice Recognition System (VRS). In
2008 international conference on computer and communication engineering
(pp.
898
902
).
IEEE
.
3.
Mishra
,
A. K.
, &
Kaul
,
A.
(
2013
, April). Error Minimization For Language-Independent Speaker Identification System. In
Conference on Advances in Communication and Control Systems (CAC2S 2013)
(pp.
415
419
).
Atlantis Press
.
4.
Shah
,
H. N. M.
,
Ab
Rashid
, M. Z.,
Abdollah
,
M. F.
,
Kamarudin
,
M. N.
,
Lin
,
C. K.
, &
Kamis
,
Z.
(
2014
).
Biometric voice recognition in security system
.
Indian journal of Science and Technology
,
7
(
2
),
104
.
5.
Volner
,
R.
, &
Boreš
,
P.
(
2005
).
A Human Classification System for Biometric Parameters
.
Elektronika ir Elektrotechnika
,
62
(
6
),
16
21
.
6.
Pisani
,
P. H.
,
Mhenni
,
A.
,
Giot
,
R.
,
Cherrier
,
E.
,
Poh
,
N.
,
Ferreira
de Carvalho
, A. C. P. D. L., … &
Amara
,
N. E. B.
(
2019
).
Adaptive biometric systems: Review and perspectives
.
ACM Computing Surveys (CSUR)
,
52
(
5
),
1
38
.
7.
Chien
,
J. T.
(
2003
).
Linear regression based Bayesian predictive classification for speech recognition
.
IEEE transactions on speech and audio processing
,
11
(
1
),
70
79
.
8.
Atal
,
B.
, &
Rabiner
,
L.
(
1976
).
A pattern recognition approach to voiced-unvoiced-silence classification with applications to speech recognition
.
IEEE Transactions on Acoustics, Speech, and Signal Processing
,
24
(
3
),
201
212
.
9.
Alain
,
C.
,
Campeanu
,
S.
, &
Tremblay
,
K.
(
2010
).
Changes in sensory evoked responses coincide with rapid improvement in speech identification performance
.
Journal of Cognitive Neuroscience
,
22
(
2
),
392
403
.
10.
Hee
Lee
, J., &
Humes
,
L. E.
(
2012
).
Effect of fundamental-frequency and sentence-onset differences on speech-identification performance of young and older adults in a competing-talker background
.
The Journal of the Acoustical Society of America
,
132
(
3
),
1700
1717
.
11.
John R.
Deller
, Jr.
,
John H. L.
Hansen
and
John G.
Proakis
; “
Discrete-Time Processing of Speech Signals
”,
IEEE Press and WILEY-INDIA
Publication; ISBN: 978-81-265-2893-6, Reprint Edition:
2011
.
12.
D.
Ververidis
and
C.
Kotropoulos
; “
Gaussian Mixture Modeling by Exploiting the Mahalanobis Distance
”,
IEEE Transactions on Signal Processing
, Vol.
56
, Page(s).
2797
2811
,
2008
.
13.
Sanderson
,
C.
, &
Paliwal
,
K. K.
(
2004
).
Identity verification using speech and face information
.
Digital Signal Processing
,
14
(
5
),
449
480
.
14.
Chen
,
L. W.
,
Guo
,
W.
, &
Dai
,
L. R.
(
2010
, November). Speaker verification against synthetic speech. In
2010 7th International Symposium on Chinese Spoken Language Processing
(pp.
309
312
).
IEEE
.
15.
Kekre
,
H. B.
, &
Kulkarni
,
V.
(
2013
, January). Closed set and open set Speaker Identification using amplitude distribution of different Transforms. In
2013 International Conference on Advances in Technology and Engineering (ICATE)
(pp.
1
8
).
IEEE
.
16.
Barai
,
B.
,
Chakraborty
,
T.
,
Das
,
N.
,
Basu
,
S.
, &
Nasipuri
,
M.
(
2022
).
Closed-set speaker identification using VQ and GMM based models
.
International Journal of Speech Technology
,
25
(
1
),
173
196
.
17.
El-Moneim
,
S. A.
,
Sedik
,
A.
,
Nassar
,
M. A.
,
El-Fishawy
,
A. S.
,
Sharshar
,
A. M.
,
Hassan
,
S. E.
, … &
Elabyad
,
G. S. M.
(
2021
).
Text-dependent and text-independent speaker recognition of reverberant speech based on CNN
.
International Journal of Speech Technology
,
24
(
4
),
993
1006
.
18.
Hébert
,
M.
(
2008
).
Text-dependent speaker recognition
.
Springer handbook of speech processing
,
743
762
.
19.
Martinez
,
J.
,
Perez
,
H.
,
Escamilla
,
E.
, &
Suzuki
,
M. M.
(
2012
, February). Speaker recognition using Mel frequency Cepstral Coefficients (MFCC) and Vector quantization (VQ) techniques. In
Conielecomp 2012, 22nd International conference on electrical communications and computers
(pp.
248
251
).
IEEE
.
20.
Stuttle
,
M. N.
(
2003
).
A Gaussian mixture model spectral representation for speech recognition
(Doctoral dissertation,
University of Cambridge
).
21.
Lanjewar
,
R. B.
,
Mathurkar
,
S.
, &
Patel
,
N.
(
2015
).
Implementation and comparison of speech emotion recognition system using Gaussian Mixture Model (GMM) and K-Nearest Neighbor (K-NN) techniques
.
Procedia computer science
,
49
,
50
57
.
22.
Rawat
,
S.
, &
Kakde
,
B.
(
2015
).
A survey on BER performance analysis in AWGN and Rayleigh Fading Channel
.
International Journal of Advanced Technology and Engineering Exploration
,
2
(
7
),
111
.
This content is only available via PDF.
You do not currently have access to this content.