A Fuzzy Matching based Image Classification System for Printed and Handwritten Text Documents

2020 ◽  
Vol 13 (2) ◽  
pp. 155-194
Author(s):  
Shalini Puri ◽  
Satya Prakash Singh

This article proposes a bi-leveled image classification system to classify printed and handwritten English documents into mutually exclusive predefined categories. The proposed system follows the steps of preprocessing, segmentation, feature extraction, and SVM based character classification at level 1, and word association and fuzzy matching based document classification at level 2. The system architecture and its modular structure discuss various task stages and their functionalities. Further, a case study on document classification is discussed to show the internal score computations of words and keywords with fuzzy matching. The experiments on proposed system illustrate that the system achieves promising results in the time-efficient manner and achieves better accuracy with less computation time for printed documents than handwritten ones. Finally, the performance of the proposed system is compared with the existing systems and it is observed that proposed system performs better than many other systems.

2019 ◽  
Vol 12 (4) ◽  
pp. 107-131 ◽  
Author(s):  
Shalini Puri ◽  
Satya Prakash Singh

This article introduces a new advanced tri-layered segmentation and bi-leveled-classifier-based Hindi printed document classification system, which categorizes imaged documents into pre-defined mutually exclusive categories by using SVM and Fuzzy matching at character and document classifications, respectively. During training, the improved and noise-free image is segmented into lines and words by profiling. Then it obtains Shirorekha Less (SL) isolated characters along with upper, left and right modifier components from the SL words. These components use their locations and inter character-modifier component distance to get associate with their corresponding characters only. Further, confidence values of all characters are calculated with SVM training and all characters are mapped into Romanized labels to generate the words. Finally, documents are classified by Fuzzy based matching of Romanized detected words and predefined classes. The average execution times of SL characters are 0.22675 sec. and 0.20375 sec. and classification accuracy are 74.61% and 80.73% for training and testing, respectively.


2018 ◽  
Vol 5 (1) ◽  
pp. 8 ◽  
Author(s):  
Ajib Susanto ◽  
Daurat Sinaga ◽  
Christy Atika Sari ◽  
Eko Hari Rachmawanto ◽  
De Rosal Ignatius Moses Setiadi

The classification of Javanese character images is done with the aim of recognizing each character. The selected classification algorithm is K-Nearest Neighbor (KNN) at K = 1, 3, 5, 7, and 9. To improve KNN performance in Javanese character written by the author, and to prove that feature extraction is needed in the process image classification of Javanese character. In this study selected Local Binary Patter (LBP) as a feature extraction because there are research objects with a certain level of slope. The LBP parameters are used between [16 16], [32 32], [64 64], [128 128], and [256 256]. Experiments were performed on 80 training drawings and 40 test images. KNN values after combination with LBP characteristic extraction were 82.5% at K = 3 and LBP parameters [64 64].


Segmentation is division of something into smaller parts and one of the Component of character recognition system. Separation of characters, words and lines are done in Segmentation from text documents. character recognition is a process which allows computers to recognize written or printed characters such as numbers or letters and to change them into a form that the computer can use. the accuracy of OCR system is done by taking the output of an OCR run for an image and comparing it to the original version of the same text. The main aim of this paper is to find out the various text line segmentations are Projection profiles, Weighted Bucket Method. Proposed method is horizontal projection profile and connected component method on Handwritten Kannada language. These methods are used for experimentation and finally comparing their accuracy and results.


2018 ◽  
Vol 5 (4) ◽  
pp. 1-31 ◽  
Author(s):  
Shalini Puri ◽  
Satya Prakash Singh

In recent years, many information retrieval, character recognition, and feature extraction methodologies in Devanagari and especially in Hindi have been proposed for different domain areas. Due to enormous scanned data availability and to provide an advanced improvement of existing Hindi automated systems beyond optical character recognition, a new idea of Hindi printed and handwritten document classification system using support vector machine and fuzzy logic is introduced. This first pre-processes and then classifies textual imaged documents into predefined categories. With this concept, this article depicts a feasibility study of such systems with the relevance of Hindi, a survey report of statistical measurements of Hindi keywords obtained from different sources, and the inherent challenges found in printed and handwritten documents. The technical reviews are provided and graphically represented to compare many parameters and estimate contents, forms and classifiers used in various existing techniques.


Sign in / Sign up

Export Citation Format

Share Document