TWO-PASS SEGMENTATION FOR MARATHI CHARACTER EXTRACTION

2020 ◽  
Vol 7 (2) ◽  
pp. 1
Author(s):  
A. KANKHAR MADHAV ◽  
C. NAMRATA MAHENDER ◽  
◽  
Keyword(s):  
2001 ◽  
Vol 10 (8) ◽  
pp. 1152-1161 ◽  
Author(s):  
Xiangyun Ye ◽  
M. Cheriet ◽  
C.Y. Suen

Author(s):  
Chulapong Panichkriangkrai ◽  
Liang Li ◽  
Ross Walker ◽  
Kozaburo Hachimura

This paper describes methods of image analysis for historical Japanese book archives with a dominant focus on character segmentation. The segmentation methodology includes stain and smear removal, binarization, character line extraction, and character extraction by region labeling with integration and separation techniques. The experimental results show that the proposed method can segment all text lines correctly and can extract more than 79% of the characters from 16 pages of Chinsetsu Yumiharizuki, containing 176 text lines and a total of 5181 quite complicated characters.


2014 ◽  
Vol 8 (3) ◽  
pp. 439-442
Author(s):  
Resmi R. Nair ◽  
A. Shobana ◽  
T. Abhinaya ◽  
S. Sibi Chakkaravarthy

2014 ◽  
Vol 14 (01n02) ◽  
pp. 1450003 ◽  
Author(s):  
S. A. Angadi ◽  
M. M. Kodabagi

Reliable extraction/segmentation of text lines, words and characters is one of the very important steps for development of automated systems for understanding the text in low resolution display board images. In this paper, a new approach for segmentation of text lines, words and characters from Kannada text in low resolution display board images is presented. The proposed method uses projection profile features and on pixel distribution statistics for segmentation of text lines. The method also detects text lines containing consonant modifiers and merges them with corresponding text lines, and efficiently separates overlapped text lines as well. The character extraction process computes character boundaries using vertical profile features for extracting character images from every text line. Further, the word segmentation process uses k-means clustering to group inter character gaps into character and word cluster spaces, which are used to compute thresholds for extracting words. The method also takes care of variations in character and word gaps. The proposed methodology is evaluated on a data set of 1008 low resolution images of display boards containing Kannada text captured from 2 mega pixel cameras on mobile phones at various sizes 240 × 320, 480 × 640 and 960 × 1280. The method achieves text line segmentation accuracy of 97.17%, word segmentation accuracy of 97.54% and character extraction accuracy of 99.09%. The proposed method is tolerant to font variability, spacing variations between characters and words, absence of free segmentation path due to consonant and vowel modifiers, noise and other degradations. The experimentation with images containing overlapped text lines has given promising results.


Sign in / Sign up

Export Citation Format

Share Document