scholarly journals Combining Morphological and Histogram based Text Line Segmentation in the OCR Context

2021 ◽  
Vol 2021 (HistoInformatics) ◽  
Author(s):  
Pit Schneider

Text line segmentation is one of the pre-stages of modern optical character recognition systems. The algorithmic approach proposed by this paper has been designed for this exact purpose. Its main characteristic is the combination of two different techniques, morphological image operations and horizontal histogram projections. The method was developed to be applied on a historic data collection that commonly features quality issues, such as degraded paper, blurred text, or presence of noise. For that reason, the segmenter in question could be of particular interest for cultural institutions, that want access to robust line bounding boxes for a given historic document. Because of the promising segmentation results that are joined by low computational cost, the algorithm was incorporated into the OCR pipeline of the National Library of Luxembourg, in the context of the initiative of reprocessing their historic newspaper collection. The general contribution of this paper is to outline the approach and to evaluate the gains in terms of accuracy and speed, comparing it to the segmentation algorithm bundled with the used open source OCR software.

Author(s):  
P. Soujanya ◽  
Vijaya Kumar Koppula ◽  
Kishore Gaddam

Segmentation of text lines is one of the important steps in the Optical Character Recognition system. Text Line Segmentation is pre-processing step of word and character segmentation. Text Line Segmentation can be viewed simple for printing documents which contains distinct spaces between the lines. And it is more complex for the documents where text lines are overlap, touch, curvilinear and variation of space between text lines like in Telugu scripts and skewed documents. The main objective of this project is to investigate different text line segmentation algorithms like Projection Profiles, Run length smearing and Adaptive Run length smearing on low quality documents. These methods are experimented and compare their accuracy and results.


2013 ◽  
Vol 64 (4) ◽  
pp. 238-243 ◽  
Author(s):  
Darko Brodić ◽  
Zoran N. Milivojević

The paper presents the algorithm for text line segmentation based on the oriented anisotropic Gaussian kernel. Initially, the document image is split into connected components achieved by bounding boxes. These connected components are cleared from redundant fragments. Furthermore, the binary moments are applied to each of these connected components evaluating local text skewing. According to this information the orientation of the anisotropic Gaussian kernel is set. After the algorithm application the boundary growing areas around connected components are established. These areas are of major importance for the evaluation of text line segmentation. For testing purposes, the algorithm is evaluated under different text samples. Comparative analysis between algorithm with and without orientation based on the anisotropic Gaussian kernel is made. The results show the improvement in the domain of text line segmentation.


Segmentation is division of something into smaller parts and one of the Component of character recognition system. Separation of characters, words and lines are done in Segmentation from text documents. character recognition is a process which allows computers to recognize written or printed characters such as numbers or letters and to change them into a form that the computer can use. the accuracy of OCR system is done by taking the output of an OCR run for an image and comparing it to the original version of the same text. The main aim of this paper is to find out the various text line segmentations are Projection profiles, Weighted Bucket Method. Proposed method is horizontal projection profile and connected component method on Handwritten Kannada language. These methods are used for experimentation and finally comparing their accuracy and results.


Author(s):  
Daniel M. Oliveira ◽  
Rafael D. Lins ◽  
Gabriel Torreão ◽  
Jian Fan ◽  
Marcelo Thielo

Sign in / Sign up

Export Citation Format

Share Document