A NEURAL-BASED PAGE SEGMENTATION SYSTEM

2005 ◽  
Vol 14 (01) ◽  
pp. 109-122 ◽  
Author(s):  
Y. ALGINAHI ◽  
D. FEKRI ◽  
M. A. SID-AHMED

Page segmentation is necessary for optical character recognition and very useful in document image manipulation. This paper describes two classification methods, a modified linear adaptive method and a proposed neural network system that classifies an image into text, halftone image (photos, dark images, etc.), and graphics (graphs, tables, flowcharts, etc.). The blocks were segmented using the Run Length Smearing Algorithm. The smearing process was done automatically by fixing the threshold values for smearing. Features are extracted from the segmented blocks for classification into text, graphics, and halftone images. The second method uses a multi-layer perceptron neural network for classification. Two parameters, a shape factor, f1, and an angle from the rectangular block segments, were fed into the neural network system giving us three classes: text, halftone images, and graphics. Experiments on 30 mixed-content document images show that the method works well on a wide variety of layouts in document images.

2019 ◽  
Vol 9 (21) ◽  
pp. 4529
Author(s):  
Tao Liu ◽  
Hao Liu ◽  
Yingying Wu ◽  
Bo Yin ◽  
Zhiqiang Wei

Capturing document images using digital cameras in uneven lighting conditions is challenging, leading to poorly captured images, which hinders the processing that follows, such as Optical Character Recognition (OCR). In this paper, we propose the use of exposure bracketing techniques to solve this problem. Instead of capturing one image, we used several images that were captured with different exposure settings and used the exposure bracketing technique to generate a high-quality image that incorporates useful information from each image. We found that this technique can enhance image quality and provides an effective way of improving OCR accuracy. Our contributions in this paper are two-fold: (1) a preprocessing chain that uses exposure bracketing techniques for document images is discussed, and an automatic registration method is proposed to find the geometric disparity between multiple document images, which lays the foundation for exposure bracketing; (2) several representative exposure bracketing algorithms are incorporated in the processing chain and their performances are evaluated and compared.


Author(s):  
María José Castro-Bleda ◽  
Slavador España-Boquera ◽  
Francisco Zamora-Martínez

The field of off-line optical character recognition (OCR) has been a topic of intensive research for many years (Bozinovic, 1989; Bunke, 2003; Plamondon, 2000; Toselli, 2004). One of the first steps in the classical architecture of a text recognizer is preprocessing, where noise reduction and normalization take place. Many systems do not require a binarization step, so the images are maintained in gray-level quality. Document enhancement not only influences the overall performance of OCR systems, but it can also significantly improve document readability for human readers. In many cases, the noise of document images is heterogeneous, and a technique fitted for one type of noise may not be valid for the overall set of documents. One possible solution to this problem is to use several filters or techniques and to provide a classifier to select the appropriate one. Neural networks have been used for document enhancement (see (Egmont-Petersen, 2002) for a review of image processing with neural networks). One advantage of neural network filters for image enhancement and denoising is that a different neural filter can be automatically trained for each type of noise. This work proposes the clustering of neural network filters to avoid having to label training data and to reduce the number of filters needed by the enhancement system. An agglomerative hierarchical clustering algorithm of supervised classifiers is proposed to do this. The technique has been applied to filter out the background noise from an office (coffee stains and footprints on documents, folded sheets with degraded printed text, etc.).


Author(s):  
Priti P. Rege ◽  
Shaheera Akhter

Text separation in document image analysis is an important preprocessing step before executing an optical character recognition (OCR) task. It is necessary to improve the accuracy of an OCR system. Traditionally, for separating text from a document, different feature extraction processes have been used that require handcrafting of the features. However, deep learning-based methods are excellent feature extractors that learn features from the training data automatically. Deep learning gives state-of-the-art results on various computer vision, image classification, segmentation, image captioning, object detection, and recognition tasks. This chapter compares various traditional as well as deep-learning techniques and uses a semantic segmentation method for separating text from Devanagari document images using U-Net and ResU-Net models. These models are further fine-tuned for transfer learning to get more precise results. The final results show that deep learning methods give more accurate results compared with conventional methods of image processing for Devanagari text extraction.


Author(s):  
Neha. N

Document image processing is an increasingly important technology essential in all optical character recognition (OCR) systems and for automation of various office documents. A document originally has zero-skew (tilt), but when a page is scanned or photo copied, skew may be introduced due to various factors and is practically unavoidable. Presence even a small amount of skew (0.50) will have detrimental effects on document analysis as it has a direct effect on the reliability and efficiency of segmentation, recognition and feature extraction stages. Therefore removal of skew is of paramount importance in the field of document analysis and OCR and is the first step to be accomplished. This paper presents a novel technique for skew detection and correction which is both language and content independent. The proposed technique is based on the maximum density of the foreground pixels and their orientation in the document image. Unlike other conventional algorithms which work only for machine printed textual documents scripted in English, this technique works well for all kinds of document images (machine printed, hand written, complex, noisy and simple). The technique presented here is tested with 150 different document image samples and is found to provide results with an accuracy of 0.10


Author(s):  
Shyamali Mitra ◽  
K. C. Santosh ◽  
Mrinal Kanti Naskar

Binarization plays a crucial role in Optical Character Recognition (OCR) ancillary domains, such as recovery of degraded document images. In Document Image Analysis (DIA), selecting threshold is not trivial since it differs from one problem (dataset) to another. Instead of trying several different thresholds for one dataset to another, we consider noise inherency of document images in our proposed binarization scheme. The proposed stochastic architecture implements the local thresholding technique: Niblack’s binarization algorithm. We introduce a stochastic comparator circuit that works on unipolar stochastic numbers. Unlike the conventional stochastic circuit, it is simple and easy to deploy. We implemented it on the Xilinx Virtex6 XC6VLX760-2FF1760 FPGA platform and received encouraging experimental results. The complete set of results are available upon request. Besides, compared to conventional designs, the proposed stochastic implementation is better in terms of time complexity as well as fault-tolerant capacity.


Author(s):  
Soumya Mishra ◽  
Debashish Nanda ◽  
Sanghamitra Mohanty

The biggest challenge in the field of image processing is to recognize documents both in printed and handwritten format. Optical Character Recognition (OCR) is a type of document image analysis where scanned digital image that contains either machine printed or handwritten script input into an OCR software engine and translating it into an editable machine readable digital text format. Development of OCRs for Indian script is an active area of research today. We are making an attempt to develop the OCR system for Oriya language, which is the official language of Orissa. Oriya language present great challenges to an OCR designer due to the large number of letters in the alphabet, the sophisticated ways in which they combine, and the complicated graphemes they result in. In this paper, we argue that a number of automatic and semi-automatic tools can ease the development of recognizers for new font styles and new scripts. We discuss briefly and show how they have helped build new OCRs for the purpose of recognizing Oriya script. We have used the Back propagation Neural Network for efficient recognition where the errors were corrected through back propagation and rectified neuron values were transmitted by feed-forward method in the neural network of multiple layers, i.e. the input layer, the output layer and the middle layer or hidden layers.


2015 ◽  
Vol 9 (3) ◽  
pp. 1-8
Author(s):  
J. O.Adigun ◽  
E. O. Omidiora ◽  
S. O. Olabiyisi ◽  
O. D. Fenwa ◽  
O. Oladipo ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document