degraded document
Recently Published Documents


TOTAL DOCUMENTS

151
(FIVE YEARS 38)

H-INDEX

16
(FIVE YEARS 3)

Author(s):  
Jayati Mukherjee ◽  
Swapan K. Parui ◽  
Utpal Roy

Segmentation of text lines and words in an unconstrained handwritten or a machine-printed degraded document is a challenging document analysis problem due to the heterogeneity in the document structure. Often there is un-even skew between the lines and also broken words in a document. In this article, the contribution lies in segmentation of a document page image into lines and words. We have proposed an unsupervised, robust, and simple statistical method to segment a document image that is either handwritten or machine-printed (degraded or otherwise). In our proposed method, the segmentation is treated as a two-class classification problem. The classification is done by considering the distribution of gap size (between lines and between words) in a binary page image. Our method is very simple and easy to implement. Other than the binarization of the input image, no pre-processing is necessary. There is no need of high computational resources. The proposed method is unsupervised in the sense that no annotated document page images are necessary. Thus, the issue of a training database does not arise. In fact, given a document page image, the parameters that are needed for segmentation of text lines and words are learned in an unsupervised manner. We have applied our proposed method on several popular publicly available handwritten and machine-printed datasets (ISIDDI, IAM-Hist, IAM, PBOK) of different Indian and other languages containing different fonts. Several experimental results are presented to show the effectiveness and robustness of our method. We have experimented on ICDAR-2013 handwriting segmentation contest dataset and our method outperforms the winning method. In addition to this, we have suggested a quantitative measure to compute the level of degradation of a document page image.


2021 ◽  
pp. 1-14
Author(s):  
R.L. Jyothi ◽  
M. Abdul Rahiman

Binarization is the most important stage in historical document image processing. Efficient working of character and word recognition algorithms depend on effective segmentation methods. Segmentation algorithms in turn depend on images free of noises and degradations. Most of these historical documents are illegible with degradations like bleeding through degradation, faded ink or faint characters, uneven illumination, contrast variation, etc. For effective processing of these document images, efficient binarization algorithms should be devised. Here a simple modified version of the Convolutional Neural Network (CNN) is proposed for historical document binarization. AOD-Net architecture for generating dehazed images from hazed images is modified to create the proposed network.The new CNN model is created by incorporating Difference of Concatenation layer (DOC), Enhancement layer (EN) and Thresholding layer into AOD-Net to make it suitable for binarization of highly degraded document images. The DOC layer and EN layer work effectively in solving degradation that exists in the form of low pass noises. The complexity of working of the proposed model is reduced by decreasing the number of layers and by introducing filters in convolution layers that work with low inter-pixel dependency. This modified version of CNN works effectively with a variety of highly degraded documents when tested with the benchmark historical datasets. The main highlight of the proposed network is that it works efficiently in a generalized manner for any type of document images without further parameter tuning. Another important highlight of this method is that it can handle most of the degradation categories present in document images. In this work, the performance of the proposed model is compared with Otsu, Sauvola, and three recent Deep Learning-based models.


Author(s):  
Uche A. Nnolim

Conventional thresholding algorithms have had limited success with degraded document images. Recently, partial differential equations (PDEs) have been applied with good results. However, these are usually tailored to handle relatively few specific distortions. In this study, we combine an edge detection term with a linear binarization source term in a PDE formulation. Additionally, a new proposed diffusivity function further amplifies desired edges. It also suppresses undesired edges that comprise bleed-through effects. Furthermore, we develop the fractional variant of the proposed scheme, which further improves results and provides more flexibility. Moreover, nonlinear color spaces are utilized to improve binarization results for images with color distortion. The proposed scheme removes document image degradation such as bleed-through, stains, smudges, etc., and also restores faded text in the images. Experimental subjective and objective results show consistently superior performance of the proposed approach compared to the state-of-the-art PDE-based models.


Author(s):  
Mr. Aniket Pagare

Segmentation of text from badly degraded document images is an extremely difficult assignment because of the high inter/Intra variety between the record foundation and the frontal area text of various report pictures. Picture preparing and design acknowledgment algorithms set aside more effort for execution on a solitary center processor. Designs Preparing Unit (GPU) is more mainstream these days because of its speed, programmability, minimal expense and more inbuilt execution centers in it. The primary objective of this exploration work is to make binarization quicker for acknowledgment of a huge number of corrupted report pictures on GPU. In this framework, we give another picture division calculation that every pixel in the picture has its own limit proposed. We are accomplishing equal work on a window of m*n size and separate article pixel of text stroke of that window. The archive text is additionally sectioned by a nearby edge that is assessed dependent on the forces of identified content stroke edge pixels inside a nearby window.


Sign in / Sign up

Export Citation Format

Share Document