This paper reviews the state-of-the-art an automatic speech recognition (ASR) based approach for speech therapy of aphasic patients. Aphasia is a condition in which the affected person suffers from speech and language disorder resulting from a stroke or brain injury. Since there is a growing body of evidence indicating the possibility of improving the symptoms at an early stage, ASR based solutions are increasingly being researched for speech and language therapy. ASR is a technology that transfers human speech into transcript text by matching with the system’s library. This is particularly useful in speech rehabilitation therapies as they provide accurate, real-time evaluation for speech input from an individual with speech disorder. ASR based approaches for speech therapy recognize the speech input from the aphasic patient and provide real-time feedback response to their mistakes. However, the accuracy of ASR is dependent on many factors such as, phoneme recognition, speech continuity, speaker and environmental differences as well as our depth of knowledge on human language understanding. Hence, the review examines recent development of ASR technologies and its performance for individuals with speech and language disorders.

1.
R. B.
Garberg
and
M.
Yudkowsky
, “Method for automatic speech recognition in telephony,” ed:
Google Patents
,
1998
.
2.
B.
Beek
,
E.
Neuberg
, and
D.
Hodge
, “
An assessment of the technology of automatic speech recognition for military applications
,”
IEEE Transactions on Acoustics, Speech, and Signal Processing
, vol.
25
, pp.
310
322
,
1977
.
3.
C. P.
Gusler
,
A. H. I.
Rick
, and
T. M.
Waters
, “Employing speech recognition and capturing customer speech to improve customer service,” ed:
Google Patents
,
2005
.
4.
A.
Abad
,
A.
Pompili
,
A.
Costa
,
I.
Trancoso
,
J.
Fonseca
,
G.
Leal
, et al., “
Automatic word naming recognition for an on-line aphasia treatment system
,”
Computer Speech & Language
, vol.
27
, pp.
1235
1248
,
2013
.
5.
D.
Le
,
K.
Licata
,
C.
Persad
, and
E. M.
Provost
, “
Automatic Assessment of Speech Intelligibility for Individuals With Aphasia
,”
IEEE/ACM Transactions on Audio, Speech, and Language Processing
, vol.
24
, pp.
2187
2199
,
2016
.
6.
T.
Lee
,
Y.
Liu
,
P.-W.
Huang
,
J.-T.
Chien
,
W. K.
Lam
,
Y. T.
Yeung
, et al., “
Automatic speech recognition for acoustical analysis and assessment of cantonese pathological voice and speech
,” in
Acoustics, Speech and Signal Processing (ICASSP), 2016 IEEE International Conference on
,
2016
, pp.
6475
6479
.
7.
M. C.
Linebarger
and
J. F.
Romania
, “Aphasia therapy system,” ed:
Google Patents
,
2007
.
8.
L. R.
Rabiner
and
R. W.
Schafer
,
Theory and applications of digital speech processing
vol.
64
:
Pearson Upper Saddle River
,
NJ
,
2011
.
9.
E.
Vincent
,
S.
Watanabe
,
A. A.
Nugraha
,
J.
Barker
, and
R.
Marxer
, “
An analysis of environment, microphone and data simulation mismatches in robust speech recognition
,”
Computer Speech & Language
,
2016
.
10.
M. B.
Mustafa
,
F.
Rosdi
,
S. S.
Salim
, and
M. U.
Mughal
, “
Exploring the influence of general and specific factors on the recognition accuracy of an ASR system for dysarthric speaker
,”
Expert Systems with Applications
, vol.
42
, pp.
3924
3932
,
2015
.
11.
M.
Cutajar
,
E.
Gatt
,
I.
Grech
,
O.
Casha
, and
J.
Micallef
, “
Comparative study of automatic speech recognition techniques
,”
IET Signal Processing
, vol.
7
, pp.
25
46
,
2013
.
12.
M. Steven
Macaluso
,
M. Joseph
Orange
, and
M. Robert
Teasell
, “
Aphasia and Apraxia
,”
2016
.
13.
S. K.
Bhogal
,
R.
Teasell
, and
M.
Speechley
, “
Intensity of aphasia therapy, impact on recovery
,”
Stroke
, vol.
34
, pp.
987
993
,
2003
.
14.
O.
Saz
,
S.-C.
Yin
,
E.
Lleida
,
R.
Rose
,
C.
Vaquero
, and
W. R.
Rodríguez
, “
Tools and technologies for computer-aided speech and language therapy
,”
Speech Communication
, vol.
51
, pp.
948
967
,
2009
.
15.
M.
Kirmess
and
L. M.
Maher
, “
Constraint induced language therapy in early aphasia rehabilitation
,”
Aphasiology
, vol.
24
, pp.
725
736
,
2010
.
16.
M. L.
Albert
, “
Treatment of aphasia
,”
Archives of Neurology
, vol.
55
, pp.
1417
1419
,
1998
.
17.
N.
Helm-Estabrooks
,
M. L.
Albert
, and
M.
Nicholas
,
Manual of aphasia and aphasia therapy
:
Pro-ed Austin
,
TX
,
2004
.
18.
W. M. E.
van de Sandt-Koenderman
, “
Aphasia rehabilitation and the role of computer technology: Can we keep up with modern times?
,”
International Journal of Speech-Language Pathology
, vol.
13
, pp.
21
27
,
2011
.
19.
L.
Besacier
,
E.
Barnard
,
A.
Karpov
, and
T.
Schultz
, “
Automatic speech recognition for under-resourced languages: A survey
,”
Speech Communication
, vol.
56
, pp.
85
100
,
2014
.
20.
J.
Manikandan
and
B.
Venkataramani
, “
Design of a real time automatic speech recognition system using modified one against all SVM classifier
,”
Microprocessors and Microsystems
, vol.
35
, pp.
568
578
,
2011
.
21.
A.
Acero
,
Acoustical and environmental robustness in automatic speech recognition
vol.
201
:
Springer Science & Business Media
,
2012
.
22.
R. K.
Aggarwal
and
M.
Dave
, “
Acoustic modeling problem for automatic speech recognition system: conventional methods (Part I)
,”
International Journal of Speech Technology
, vol.
14
, p.
297
,
2011
.
23.
X.
Huang
,
A.
Acero
,
H.-W.
Hon
, and
R. Foreword
By-Reddy
,
Spoken language processing: A guide to theory, algorithm, and system development
:
Prentice hall PTR
,
2001
.
24.
K.
Gupta
and
D.
Gupta
, “
An analysis on LPC, RASTA and MFCC techniques in Automatic Speech recognition system
,” in
Cloud System and Big Data Engineering (Confluence), 2016 6th International Conference
,
2016
, pp.
493
497
.
25.
M.
Anusuya
and
S.
Katti
, “
Front end analysis of speech recognition: a review
,”
International Journal of Speech Technology
, vol.
14
, pp.
99
145
,
2011
.
26.
C.
Kim
and
R. M.
Stern
, “
Power-normalized cepstral coefficients (PNCC) for robust speech recognition
,”
IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP)
, vol.
24
, pp.
1315
1329
,
2016
.
27.
Z.
Tüske
,
P.
Golik
,
R.
Schlüter
, and
H.
Ney
, “
Acoustic modeling with deep neural networks using raw time signal for LVCSR
,” in
Interspeech
,
2014
, pp.
890
894
.
28.
M. H. Mohamad
Jamil
,
S.
Al-Haddad
, and
C. Kyun
Ng
, “
A flexible speech recognition system for cerebral palsy disabled
,”
Informatics Engineering and Information Science
, pp.
42
55
,
2011
.
29.
A.
Garg
and
P.
Sharma
, “
Survey on acoustic modeling and feature extraction for speech recognition
,” in
Computing for Sustainable Global Development (INDIACom), 2016 3rd International Conference on
,
2016
, pp.
2291
2295
.
30.
G. A.
Saon
and
H.
Soltau
, “Method and system for joint training of hybrid neural networks for acoustic modeling in automatic speech recognition,” ed:
Google Patents
,
2017
.
31.
P. P.
Marsal
,
S.
Pol
,
A.
Hagen
,
H.
Bourlard
, and
C.
Nadeu
, “
Comparison and combination of RASTA-PLP and FF features in a hybrid HMM/MLP speech recognition system
,” in
INTERSPEECH
,
2002
.
32.
S. M.
Mirhassani
and
H.-N.
Ting
, “
Fuzzy-based discriminative feature representation for children’s speech recognition
,”
Digital Signal Processing
, vol.
31
, pp.
102
114
,
2014
.
33.
S.
Hegde
,
K.
Achary
, and
S.
Shetty
, “
Feature selection using Fisher’s ratio technique for automatic speech recognition
,” arXiv preprintarXiv:1505.03239,
2015
.
34.
G.
Garau
and
S.
Renals
, “
Combining spectral representations for large-vocabulary continuous speech recognition
,”
IEEE Transactions on Audio, Speech, and Language Processing
, vol.
16
, pp.
508
518
,
2008
.
35.
C.
Ittichaichareon
,
S.
Suksri
, and
T.
Yingthawornsuk
, “
Speech recognition using MFCC
,” in
International Conference on Computer Graphics, Simulation and Modeling (ICGSM’2012)
July,
2012
, pp.
28
29
.
36.
S.
Xue
,
O.
Abdel-Hamid
,
H.
Jiang
,
L.
Dai
, and
Q.
Liu
, “
Fast adaptation of deep neural network based on discriminant codes for speech recognition
,”
IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP)
, vol.
22
, pp.
1713
1725
,
2014
.
37.
S.
Romdhani
, “
Implementation of DNN-HMM Acoustic Models for Phoneme Recognition
,”
2015
.
38.
P.
Pujol
,
S.
Pol
,
C.
Nadeu
,
A.
Hagen
, and
H.
Bourlard
, “
Comparison and combination of features in a hybrid HMM/MLP and a HMM/GMM speech recognition system
,”
IEEE Transactions on Speech and Audio processing
, vol.
13
, pp.
14
22
,
2005
.
39.
E.
Zarrouk
,
Y. B.
Ayed
, and
F.
Gargouri
, “
Hybrid continuous speech recognition systems by HMM, MLP and SVM: a comparative study
,”
International Journal of Speech Technology
, vol.
17
, pp.
223
233
,
2014
.
40.
M. L.
Seltzer
,
D.
Yu
, and
Y.
Wang
, “
An investigation of deep neural networks for noise robust speech recognition
,” in
Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on
,
2013
, pp.
7398
7402
.
41.
O.
Abdel-Hamid
,
L.
Deng
, and
D.
Yu
, “
Exploring convolutional neural network structures and optimization techniques for speech recognition
,” in
Interspeech
,
2013
, pp.
3366
3370
.
42.
Y.
Zheng
, “
Acoustic modeling and feature selection for speech recognition
,”
Citeseer
,
2005
.
43.
S. O. C.
Morales
and
S. J.
Cox
, “
Modelling errors in automatic speech recognition for dysarthric speakers
,”
EURASIP Journal on Advances in Signal Processing
, vol.
2009
, p.
308340
,
2009
.
44.
F.
Rudzicz
, “
Phonological features in discriminative classification of dysarthric speech
,” in
Acoustics, Speech and Signal Processing, 2009. ICASSP 2009. IEEE International Conference on
,
2009
, pp.
4605
4608
.
45.
S. R.
Shahamiri
and
S. S. B.
Salim
, “
Artificial neural networks as speech recognisers for dysarthric speech: Identifying the best-performing set of MFCC parameters and studying a speaker-independent approach
,”
Advanced Engineering Informatics
, vol.
28
, pp.
102
110
,
2014
.
46.
P. D.
Green
,
J.
Carmichael
,
A.
Hatzis
,
P.
Enderby
,
M. S.
Hawley
, and
M.
Parker
, “
Automatic speech recognition with sparse training data for dysarthric speakers
,” in
INTERSPEECH
,
2003
.
47.
M.
Hariharan
,
C. Y.
Fook
,
R.
Sindhu
,
A. H.
Adom
, and
S.
Yaacob
, “
Objective evaluation of speech dysfluencies using wavelet packet transform with sample entropy
,”
Digital Signal Processing
, vol.
23
, pp.
952
959
,
2013
.
48.
S. A.
Alim
,
N. K. A.
Rashid
,
W.
Sediono
,
N.
Wahidah
, and
N.
Nur
, “
LPC and its derivatives for stuttered speech recognition
,”
Jurnal Teknologi (Sciences & Engineering)
77
:
18
(
2015
)
11
16
, vol. 77, pp. 11-16, 2015.
49.
P. A.
Heeman
,
R.
Lunsford
,
A.
McMillin
, and
J. S.
Yaruss
, “
Using Clinician Annotations to Improve Automatic Speech Recognition of Stuttered Speech
,”
Interspeech 2016
, pp.
2651
2655
,
2016
.
50.
O. C.
Ai
,
M.
Hariharan
,
S.
Yaacob
, and
L. S.
Chee
, “
Classification of speech dysfluencies with MFCC and LPCC features
,”
Expert Systems with Applications
, vol.
39
, pp.
2157
2165
,
2012
.
51.
D.
Le
and
E. M.
Provost
, “
Improving Automatic Recognition of Aphasic Speech with AphasiaBank
,”
Interspeech 2016
, pp.
2681
2685
,
2016
.
This content is only available via PDF.