Transform based approach for Indic script identification from handwritten document images

Author(s):  
Sk Md Obaidullah ◽  
Rownaqul Karim ◽  
Sujal Shaikh ◽  
Chayan Halder ◽  
Nibaran Das ◽  
...  
2018 ◽  
Vol 27 (3) ◽  
pp. 465-488 ◽  
Author(s):  
Pawan Kumar Singh ◽  
Supratim Das ◽  
Ram Sarkar ◽  
Mita Nasipuri

Abstract The feature selection process can be considered a problem of global combinatorial optimization in machine learning, which reduces the irrelevant, noisy, and non-contributing features, resulting in acceptable classification accuracy. Harmony search algorithm (HSA) is an evolutionary algorithm that is applied to various optimization problems such as scheduling, text summarization, water distribution networks, vehicle routing, etc. This paper presents a hybrid approach based on support vector machine and HSA for wrapper feature subset selection. This approach is used to select an optimized set of features from an initial set of features obtained by applying Modified log-Gabor filters on prepartitioned rectangular blocks of handwritten document images written in either of 12 official Indic scripts. The assessment justifies the need of feature selection for handwritten script identification where local and global features are computed without knowing the exact importance of features. The proposed approach is also compared with four well-known evolutionary algorithms, namely genetic algorithm, particle swarm optimization, tabu search, ant colony optimization, and two statistical feature dimensionality reduction techniques, namely greedy attribute search and principal component analysis. The acquired results show that the optimal set of features selected using HSA gives better accuracy in handwritten script recognition.


Author(s):  
Sk. Md. Obaidullah ◽  
K. C. Santosh ◽  
Nibaran Das ◽  
Chayan Halder ◽  
Kaushik Roy

Script identification is crucial for automating optical character recognition (OCR) in multi-script documents since OCRs are script-dependent. In this paper, we present a comprehensive survey of the techniques developed for handwritten Indic script identification. Different pre-processing and feature extraction techniques, including classifiers used for script identification, are categorized and their merits and demerits are discussed. We also provide information about some handwritten Indic script datasets. Finally, we highlight the extensions and/or future scope of works together with challenges.


Author(s):  
Sk Md Obaidullah ◽  
Chayan Halder ◽  
Nibaran Das ◽  
Kaushik Roy

In this paper, two popular eastern Indian scripts namely Bangla and Oriya are considered for Line-level script identification considering two Tri-script groups where Devnagari and Roman are kept common in each group. A 27 dimensional feature vector has been constructed using FD (Fractal Dimension) and IMT (Interpolated Morphological Transform). 600 Line-level handwritten document images of each Tri-script groups have been considered for experimentation. Promising results has been found using multiple classifiers where MLP (Multi-Layer Perceptron) Neural Network and LMT (Logistic Model Tree) perform best for BDR (Bangla-Devnagari-Roman) combinations with 97% accuracy and LMT outperforms over others for ODR (Oriya-Devnagari-Roman) combinations with 97.7% accuracy. Bi-script performance analysis has also been made where combinations BR (Bangla-Roman) and BD (Bangla-Devnagari) results with accuracy of 98% and 97.5% respectively for the first group. Whereas for the second group OD (Oriya-Devnagari) and OR (Oriya-Roman) shows an accuracy of 98.25% and 98% respectively.


Sign in / Sign up

Export Citation Format

Share Document