Transform based approach for Indic script identification from handwritten document images

Abstract The feature selection process can be considered a problem of global combinatorial optimization in machine learning, which reduces the irrelevant, noisy, and non-contributing features, resulting in acceptable classification accuracy. Harmony search algorithm (HSA) is an evolutionary algorithm that is applied to various optimization problems such as scheduling, text summarization, water distribution networks, vehicle routing, etc. This paper presents a hybrid approach based on support vector machine and HSA for wrapper feature subset selection. This approach is used to select an optimized set of features from an initial set of features obtained by applying Modified log-Gabor filters on prepartitioned rectangular blocks of handwritten document images written in either of 12 official Indic scripts. The assessment justifies the need of feature selection for handwritten script identification where local and global features are computed without knowing the exact importance of features. The proposed approach is also compared with four well-known evolutionary algorithms, namely genetic algorithm, particle swarm optimization, tabu search, ant colony optimization, and two statistical feature dimensionality reduction techniques, namely greedy attribute search and principal component analysis. The acquired results show that the optimal set of features selected using HSA gives better accuracy in handwritten script recognition.

Download Full-text

AUTOMATIC LINE-LEVEL SCRIPT IDENTIFICATION FROM HANDWRITTEN DOCUMENT IMAGES - A REGION-WISE CLASSIFICATION FRAMEWORK FOR INDIAN SUBCONTINENT

Malaysian Journal of Computer Science ◽

10.22452/mjcs.vol31no1.5 ◽

2018 ◽

Vol 31 (1) ◽

pp. 63-84 ◽

Cited By ~ 2

Author(s):

Sk Md Obaidullah ◽

Chayan Halder ◽

K. C. Santosh ◽

Nibaran Das ◽

Kaushik Roy

Keyword(s):

Indian Subcontinent ◽

Automatic Line ◽

Document Images ◽

Script Identification ◽

Classification Framework ◽

Handwritten Document ◽

Line Level

Download Full-text

Handwritten Indic Script Identification in Multi-Script Document Images: A Survey

International Journal of Pattern Recognition and Artificial Intelligence ◽

10.1142/s0218001418560128 ◽

2018 ◽

Vol 32 (10) ◽

pp. 1856012 ◽

Cited By ~ 6

Author(s):

Sk. Md. Obaidullah ◽

K. C. Santosh ◽

Nibaran Das ◽

Chayan Halder ◽

Kaushik Roy

Keyword(s):

Feature Extraction ◽

Character Recognition ◽

Optical Character Recognition ◽

Document Images ◽

Extraction Techniques ◽

Script Identification ◽

Optical Character ◽

Comprehensive Survey ◽

Indic Script

Script identification is crucial for automating optical character recognition (OCR) in multi-script documents since OCRs are script-dependent. In this paper, we present a comprehensive survey of the techniques developed for handwritten Indic script identification. Different pre-processing and feature extraction techniques, including classifiers used for script identification, are categorized and their merits and demerits are discussed. We also provide information about some handwritten Indic script datasets. Finally, we highlight the extensions and/or future scope of works together with challenges.

Download Full-text

Handwritten Indic Script Identification from Document Images—A Statistical Comparison of Different Attribute Selection Techniques in Multi-classifier Environment

Advances in Intelligent Systems and Computing - Proceedings of the Second International Conference on Computer and Communication Technologies ◽

10.1007/978-81-322-2526-3_51 ◽

2015 ◽

pp. 491-500

Author(s):

Sk Md Obaidullah ◽

Chayan Halder ◽

Nibaran Das ◽

Kaushik Roy

Keyword(s):

Attribute Selection ◽

Document Images ◽

Script Identification ◽

Statistical Comparison ◽

Indic Script

Download Full-text

Indic script identification from handwritten document images

International Journal of Intelligent Systems Technologies and Applications ◽

10.1504/ijista.2019.099341 ◽

2019 ◽

Vol 18 (3) ◽

pp. 303

Author(s):

Pawan Kumar Singh ◽

Ram Sarkar ◽

Mita Nasipuri

Keyword(s):

Document Images ◽

Script Identification ◽

Handwritten Document

Download Full-text

Bangla and Oriya Script Lines Identification from Handwritten Document Images in Tri-script Scenario

International Journal of Service Science Management Engineering and Technology ◽

10.4018/ijssmet.2016010103 ◽

2016 ◽

Vol 7 (1) ◽

pp. 43-60

Author(s):

Sk Md Obaidullah ◽

Chayan Halder ◽

Nibaran Das ◽

Kaushik Roy

Keyword(s):

Logistic Model ◽

Feature Vector ◽

Multi Layer Perceptron ◽

Document Images ◽

Script Identification ◽

Multiple Classifiers ◽

Group A ◽

Handwritten Document ◽

Logistic Model Tree ◽

Line Level

In this paper, two popular eastern Indian scripts namely Bangla and Oriya are considered for Line-level script identification considering two Tri-script groups where Devnagari and Roman are kept common in each group. A 27 dimensional feature vector has been constructed using FD (Fractal Dimension) and IMT (Interpolated Morphological Transform). 600 Line-level handwritten document images of each Tri-script groups have been considered for experimentation. Promising results has been found using multiple classifiers where MLP (Multi-Layer Perceptron) Neural Network and LMT (Logistic Model Tree) perform best for BDR (Bangla-Devnagari-Roman) combinations with 97% accuracy and LMT outperforms over others for ODR (Oriya-Devnagari-Roman) combinations with 97.7% accuracy. Bi-script performance analysis has also been made where combinations BR (Bangla-Roman) and BD (Bangla-Devnagari) results with accuracy of 98% and 97.5% respectively for the first group. Whereas for the second group OD (Oriya-Devnagari) and OR (Oriya-Roman) shows an accuracy of 98.25% and 98% respectively.

Download Full-text

Transform based approach for Indic script identification from handwritten document images

An Approach for Automatic Indic Script Identification from Handwritten Document Images

Convolution Based Technique for Indic Script Identification from Handwritten Document Images

Indic script identification from handwritten document images — An unconstrained block-level approach

Gabor Filter Based Technique for Offline Indic Script Identification from Handwritten Document Images

Feature Selection Using Harmony Search for Script Identification from Handwritten Document Images

AUTOMATIC LINE-LEVEL SCRIPT IDENTIFICATION FROM HANDWRITTEN DOCUMENT IMAGES - A REGION-WISE CLASSIFICATION FRAMEWORK FOR INDIAN SUBCONTINENT

Handwritten Indic Script Identification in Multi-Script Document Images: A Survey

Handwritten Indic Script Identification from Document Images—A Statistical Comparison of Different Attribute Selection Techniques in Multi-classifier Environment

Indic script identification from handwritten document images

Bangla and Oriya Script Lines Identification from Handwritten Document Images in Tri-script Scenario

Export Citation Format