Robust Text Line, Word And Character Extraction from Telugu Document Image

AbstractA novel piecewise water flow technique for text line extraction from multi-skewed document images of handwritten text of different scripts is presented here. The basic water flow technique assumes that the hypothetical water flows from both left and right sides of the image frame. This flow of water fills up the gaps between consecutive objects (texts) but faces obstruction if any object lies in the path of the flow. All unwetted regions in the document image are then labeled distinctly to extract the text lines. However, the technique fails when two neighboring text lines touch each other, as water gets obstructed by the touching segment(s). To get rid of this difficulty, we have modified the basic water flow technique by iteratively applying the same over the vertically segmented document images. The main purpose of this vertical segmentation is to localize the text line segment(s) where two text lines get joined. These segments are then horizontally fragmented, and each fragment is placed suitably to the text line in which it actually belongs to. This way, the probable data loss during isolation of the touching text line segment is minimized. Both the techniques (current and basic ones) have been tested on three different databases, viz., CMATERdb 1.1.1, CMATERdb 1.1.2, and ICDAR2009 handwritten segmentation contest pages, respectively. The test results show that the present technique outperforms the basic one for all three databases.

Download Full-text

A Robust Segmentation Technique for Line, Word and Character Extraction from Kannada Text in Low Resolution Display Board Images

International Journal of Image and Graphics ◽

10.1142/s021946781450003x ◽

2014 ◽

Vol 14 (01n02) ◽

pp. 1450003 ◽

Cited By ~ 2

Author(s):

S. A. Angadi ◽

M. M. Kodabagi

Keyword(s):

Extraction Process ◽

Word Segmentation ◽

Text Line ◽

Low Resolution ◽

Data Set ◽

Display Board ◽

Robust Segmentation ◽

Character Extraction ◽

Segmentation Accuracy ◽

Line Segmentation

Reliable extraction/segmentation of text lines, words and characters is one of the very important steps for development of automated systems for understanding the text in low resolution display board images. In this paper, a new approach for segmentation of text lines, words and characters from Kannada text in low resolution display board images is presented. The proposed method uses projection profile features and on pixel distribution statistics for segmentation of text lines. The method also detects text lines containing consonant modifiers and merges them with corresponding text lines, and efficiently separates overlapped text lines as well. The character extraction process computes character boundaries using vertical profile features for extracting character images from every text line. Further, the word segmentation process uses k-means clustering to group inter character gaps into character and word cluster spaces, which are used to compute thresholds for extracting words. The method also takes care of variations in character and word gaps. The proposed methodology is evaluated on a data set of 1008 low resolution images of display boards containing Kannada text captured from 2 mega pixel cameras on mobile phones at various sizes 240 × 320, 480 × 640 and 960 × 1280. The method achieves text line segmentation accuracy of 97.17%, word segmentation accuracy of 97.54% and character extraction accuracy of 99.09%. The proposed method is tolerant to font variability, spacing variations between characters and words, absence of free segmentation path due to consonant and vowel modifiers, noise and other degradations. The experimentation with images containing overlapped text lines has given promising results.

Download Full-text

General Document Image Correction Method Based on Text Line in Natural Scene

DEStech Transactions on Computer Science and Engineering ◽

10.12783/dtcse/cisnrc2019/33349 ◽

2019 ◽

Author(s):

YUZE XIANG ◽

JIANI GAO ◽

HAIJIAO SHAN ◽

XIAOKUN HE

Keyword(s):

Correction Method ◽

Document Image ◽

Text Line ◽

Natural Scene ◽

Image Correction

Download Full-text

An approach to extracting the target text line from a document image captured by a pen scanner

Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings. ◽

10.1109/icdar.2003.1227631 ◽

2004 ◽

Author(s):

Zhen-Long Bai ◽

Qiang Huo

Keyword(s):

Document Image ◽

Text Line

Download Full-text

Text Line Segmentation with the Algorithm Based on the Oriented Anisotropic Gaussian Kernel

Journal of Electrical Engineering ◽

10.2478/jee-2013-0034 ◽

2013 ◽

Vol 64 (4) ◽

pp. 238-243 ◽

Cited By ~ 1

Author(s):

Darko Brodić ◽

Zoran N. Milivojević

Keyword(s):

Comparative Analysis ◽

Document Image ◽

Gaussian Kernel ◽

Connected Components ◽

Text Line ◽

Text Line Segmentation ◽

Bounding Boxes ◽

Line Segmentation

The paper presents the algorithm for text line segmentation based on the oriented anisotropic Gaussian kernel. Initially, the document image is split into connected components achieved by bounding boxes. These connected components are cleared from redundant fragments. Furthermore, the binary moments are applied to each of these connected components evaluating local text skewing. According to this information the orientation of the anisotropic Gaussian kernel is set. After the algorithm application the boundary growing areas around connected components are established. These areas are of major importance for the evaluation of text line segmentation. For testing purposes, the algorithm is evaluated under different text samples. Comparative analysis between algorithm with and without orientation based on the anisotropic Gaussian kernel is made. The results show the improvement in the domain of text line segmentation.

Download Full-text

Text Line Segmentation With Water Flow Algorithm Based on Power Function

Journal of Electrical Engineering ◽

10.2478/jee-2015-0021 ◽

2015 ◽

Vol 66 (3) ◽

pp. 132-141 ◽

Cited By ~ 2

Author(s):

Darko Brodić

Keyword(s):

Power Function ◽

Water Flow ◽

Document Image ◽

Text Line ◽

Flow Function ◽

Basic Algorithm ◽

Process Stage ◽

Flow Algorithm ◽

Text Line Segmentation ◽

Line Segmentation

Abstarct This manuscript proposes an extension to the water flow algorithm for text line segmentation. Basic algorithm assumes hypothetical water flows under few specified angles of the document image frame from left to right and vice versa. As a result, unwetted image regions that incorporate text are extracted. These regions are of the major importance for text line segmentation. The extension of the basic algorithm means modification of water flow function that creates the unwetted region. Hence, the linear water flow function used in the basic algorithm is changed with its power function counterpart. Extended method was tested, examined and evaluated under different text samples. Results are encouraging due to improving text line segmentation which is a key process stage.

Download Full-text