Word Segmentation and Baseline Detection in Handwritten Documents Using Isothetic Covers

Author(s):  
Aisharjya Sarkar ◽  
Arindam Biswas ◽  
Partha Bhowmick ◽  
Bhargab B. Bhattacharya
Author(s):  
Mousumi Dutt ◽  
Aisharjya Sarkar ◽  
Arindam Biswas ◽  
Partha Bhowmick ◽  
Bhargab B. Bhattacharya

Analysis of handwritten documents is a challenging task in the modern era of document digitization. It requires efficient preprocessing which includes word segmentation and baseline detection. This paper proposes a novel approach toward word segmentation and baseline detection in a handwritten document. It is based on certain structural properties of isothetic covers tightly enclosing the words in a handwritten document. For an appropriate grid size, the isothetic covers successfully segregate the words so that each cover corresponds to a particular word. The grid size is selected by an adaptive technique that classifies the inter-cover distances into two classes in an unsupervised manner. Finally, by using a geometric heuristic with the horizontal chords of these covers, the corresponding baselines are extracted. Owing to its traversal strategy along the word boundaries in a combinatorial manner and usage of limited operations strictly in the integer domain, the method is found to be quite fast, efficient, and robust, as demonstrated by experimental results with datasets of both Bengali and English handwritings.


Author(s):  
Prabhakar C. J.

In this chapter, the author present a segmentation-free-based word spotting method for handwritten documents using Scale Space co-occurrence histograms of oriented gradients (Co-HOG) feature descriptor. The chapter begin with introduction to word spotting, its challenges, and applications. It is followed by review of the existing techniques for word spotting in handwritten documents. The literature survey reveals that segmentation-based word spotting methods usually need a layout analysis step for word segmentation, and any segmentation errors can affect the subsequent word representations and matching steps. Hence, in order to overcome the drawbacks of segmentation-based methods, the author proposed segmentation-free word spotting using Scale Space Co-HOG feature descriptor. The proposed method is evaluated using mean Average Precision (mAP) through experimentation conducted on popular datasets such as GW and IAM. The performance of the proposed method is compared with existing state-of-the-segmentation and segmentation-free methods, and there is a considerable increase in accuracy.


2011 ◽  
Vol 2 (3) ◽  
pp. 1-13 ◽  
Author(s):  
Mousumi Dutt ◽  
Aisharjya Sarkar ◽  
Arindam Biswas ◽  
Partha Bhowmick ◽  
Bhargab B. Bhattacharya

Author(s):  
Yue Xu ◽  
Fei Yin ◽  
Zhaoxiang Zhang ◽  
Cheng-Lin Liu

Layout analysis is a fundamental process in document image analysis and understanding. It consists of several sub-processes such as page segmentation, text line segmentation, baseline detection and so on. In this work, we propose a multi-task layout analysis method that use a single FCN model to solve the above three problems simultaneously. The FCN is trained to segment the document image into different regions and detect the center line of each text line by classifying pixels into different categories. By supervised learning on document images with pixel-wise labels, the FCN can extract discriminative features and perform pixel-wise classification accurately. After pixel-wise classification, post-processing steps are taken to reduce noises, correct wrong segmentations and find out overlapping regions. Experimental results on the public dataset DIVA-HisDB containing challenging medieval manuscripts demonstrate the effectiveness and superiority of the proposed method.


2009 ◽  
Vol 42 (12) ◽  
pp. 3169-3183 ◽  
Author(s):  
G. Louloudis ◽  
B. Gatos ◽  
I. Pratikakis ◽  
C. Halatsis

Sign in / Sign up

Export Citation Format

Share Document