scholarly journals Automatic optimized document skew pre-processor for character segmentation algorithm

2017 ◽  
Vol 30 (4) ◽  
pp. 611-625 ◽  
Author(s):  
Vladan Vuckovic ◽  
Boban Arizanovic

In this paper, as a part of character segmentation algorithm, an automatic optimized document skew correction approach based on Hough transform is presented. The importance of skew correction in document image analysis lies in the fact that further processing is impossible if the document image is skewed. The proposed approach is based on fast implementation of the standard Hough transform which is followed by highly optimized low-level machine code implementation of the image rotation. In order to achieve high computational results, linear image representation is used. The proposed approach results from the aspect of time complexity and skew estimation accuracy which are analyzed and compared with the already existing skew correction approaches. The proposed approach gives better results compared with analogous approach used in related work, but it gives worse results compared with optimized version which exploits a BAG algorithm. Provided results show significant improvement of the standard Hough transform implementation.

1985 ◽  
Vol 16 (4) ◽  
pp. 66-75 ◽  
Author(s):  
Osamu Nakamura ◽  
Makoto Ujiie ◽  
Noriyoshi Okamoto ◽  
Toshi Minami

Author(s):  
YAN ZHANG ◽  
BIN YU ◽  
HAI-MING GU

Document image segmentation is an important research area of document image analysis which classifies the contents of a document image into a set of text and non-text classes. Previous existing methods are often designed to classify text and halftone therefore they perform poorly in classifying graphics, tables and circuit, etc. In this paper, we present a robust multi-level classification method using multi-layer perceptron (MLP) and support vector machine (SVM) to segment the texts from non-texts and thereafter classify them as tables, graphics and halftones. This method outperforms previously existing methods by overcoming various issues associated with the complexity of document images. Experimental results prove the effectiveness of our proposed method. By virtue of our multi-level classification approach, the text components, halftone components, graphic components and table components are accurately classified respectively which would highly improve OCR accuracy to reduce garbage symbols as well as increase compression ratio thereafter simultaneously.


Author(s):  
Himadri Mukherjee ◽  
Payel Rakshit ◽  
Ankita Dhar ◽  
Sk Md Obaidullah ◽  
KC Santosh ◽  
...  

2018 ◽  
Vol 355 (16) ◽  
pp. 8225-8244 ◽  
Author(s):  
Fatim Zahra Ait Bella ◽  
Mohammed El Rhabi ◽  
Abdelilah Hakim ◽  
Amine Laghrib

2008 ◽  
Vol 41 (12) ◽  
pp. 3528-3546 ◽  
Author(s):  
Chandan Singh ◽  
Nitin Bhatia ◽  
Amandeep Kaur

Sign in / Sign up

Export Citation Format

Share Document