A novel method is proposed to recognize the Arab/Jawi and Roman digits. This new method is based on features from the triangle geometry, normalized into nine features. The features are used for zoning which results in five and 25 zones. The algorithm is validated by using three standard datasets which are publicly available and used by researchers in this field. The first dataset is HODA that contains 60,000 images for training and 20,000 images for testing. The second dataset is IFHCDB. This dataset has 52,380 isolated characters and 17,740 digits. Only the 17,740 images of digits are used for this research. For the roman digit, MNIST are chosen. MNIST dataset has 60,000 images for training and 10,000 images for testing. Supervised (SML) and Unsupervised Machine Learning (UML) are used to test the nine features. The SML used are Neural Network (NN) and Support Vector Machine (SVM). Whereas the UML uses Euclidean Distance Method with data mining algorithms; namely Mean Average Precision (eMAP) and Frequency Based (eFB). Results for SML testing for HODA dataset are 98.07% accuracy for SVM, and 96.73% for NN. For IFHCDB and MNIST the accuracy are 91.75% and 93.095% respectively. For the UML tests, HODA dataset is 93.91%, IFHCDB 85.94% and MNIST 86.61%. The train and test images are selected using both random and the original dataset's distribution. The results show that the accuracy of proposed algorithm is over 90% for each SML trained datasets where the highest result is the one that uses 25 zones features.
Digit recognition for Arabic/Jawi and Roman using features from triangle geometry
Mohd Sanusi Azmi, Khairuddin Omar, Mohamad Faidzul Nasrudin, Bahari Idrus, Khadijah Wan Mohd Ghazali; Digit recognition for Arabic/Jawi and Roman using features from triangle geometry. AIP Conf. Proc. 22 April 2013; 1522 (1): 526–537. https://doi.org/10.1063/1.4801171
Download citation file: