On appearance-based feature extraction methods for writer-independent handwritten text recognition

2021 ◽

Vol 3 (8) ◽

Author(s):

Fetulhak Abdurahman ◽

Eyob Sisay ◽

Kinde Anlay Fante

Keyword(s):

Neural Network ◽

Neural Networks ◽

Feature Extraction ◽

Word Recognition ◽

Recurrent Neural Network ◽

Text Recognition ◽

Input Word ◽

Handwritten Text ◽

Handwritten Text Recognition ◽

Word Images

AbstractAmharic ("Image missing") is the official language of the Federal Government of Ethiopia, with more than 27 million speakers. It uses an Ethiopic script, which has 238 core and 27 labialized characters. It is a low-resourced language, and a few attempts have been made so far for its handwritten text recognition. However, Amharic handwritten text recognition is challenging due to the very high similarity between characters. This paper presents a convolutional recurrent neural networks based offline handwritten Amharic word recognition system. The proposed framework comprises convolutional neural networks (CNNs) for feature extraction from input word images, recurrent neural network (RNNs) for sequence encoding, and connectionist temporal classification as a loss function. We designed a custom CNN model and compared its performance with three different state-of-the-art CNN models, including DenseNet-121, ResNet-50 and VGG-19 after modifying their architectures to fit our problem domain, for robust feature extraction from handwritten Amharic word images. We have conducted detailed experiments with different CNN and RNN architectures, input word image sizes, and applied data augmentation techniques to enhance performance of the proposed models. We have prepared a handwritten Amharic word dataset, HARD-I, which is available publicly for researchers. From the experiments on various recognition models using our dataset, a WER of 5.24 % and CER of 1.15 % were achieved using our best-performing recognition model. The proposed models achieve a competitive performance compared to existing models for offline handwritten Amharic word recognition.

Download Full-text

Sindhi Handwritten Text Recognition Using SVM

International Journal of Advanced Trends in Computer Science and Engineering ◽

10.30534/ijatcse/2021/201032021 ◽

2021 ◽

Vol 10 (3) ◽

pp. 1627-1631

Keyword(s):

Feature Extraction ◽

Complex Problem ◽

Training Data ◽

Text Recognition ◽

Support Vector ◽

Text Data ◽

Handwritten Text ◽

Handwritten Text Recognition ◽

Text Feature ◽

Language Text

In Sindhi Language, handwritten text feature extraction is such a challenging task for all scholars, because different people write in different styles or manners, to analyze each text is such a complex problem. Feature extraction of text segmentation, classifying each character and labelling for training data to recognize text for different handwritings and testing for analyzing features of providing handwritten text data .In this research, SVM (support vector machine) is used for analyzing and tokenizing each character or word of Sindhi Language text and transform into suitable information with efficiency & accuracy. The research is not only useful for improving the knowledge of Sindhi Handwritten Text Recognition but it can be beneficial for other recognition systems

Download Full-text

Feature Extraction Comparison in Handwriting Recognition of Batak Toba Alphabet

IJITEE (International Journal of Information Technology and Electrical Engineering) ◽

10.22146/ijitee.31969 ◽

2018 ◽

Vol 1 (3) ◽

pp. 86

Author(s):

Novie Theresia Br Pasaribu ◽

M. Jimmy Hasugian

Keyword(s):

Feature Extraction ◽

Handwriting Recognition ◽

Noise Removal ◽

Text Recognition ◽

Fourier Descriptor ◽

Research Topics ◽

Discriminative Feature ◽

Handwritten Text ◽

Handwritten Text Recognition ◽

Offline Handwriting Recognition

Offline handwriting recognition is one of the most prominent research topics due to its tremendous application and high variability as well. This paper covers the offline Batak Toba handwritten text recognition, from the noise removal, the process of feature extraction until the recognition by using several classifiers. Experiments show that elliptic fourier descriptor (EFD) is the most discriminative feature and Mahalanobis distance (MD) outperforms the two others classifier.

Download Full-text