Automatic Anonymization of Printed-Text Document Images

Abstract: In this paper, we present a scheme to develop to complete OCR system for printed text English Alphabet of Uppercase of different font and of different sizes so that we can use this system in Banking, Corporate, Legal industry and so on. OCR system consists of different modules like preprocessing, segmentation, feature extraction and recognition. In preprocessing step it is expected to include image gray level conversion, binary conversion etc. After finding out the feature of the segmented characters artificial neural network and can be used for Character Recognition purpose. Efforts have been made to improve the performance of character recognition using artificial neural network techniques. The proposed OCR system is capable of accepting printed document images from a file and implemented using MATLAB R2014a version. Key words: OCR, Printed text, Barcode recognition

Download Full-text

Corpus-based technique for improving Arabic OCR system

Indonesian Journal of Electrical Engineering and Computer Science ◽

10.11591/ijeecs.v21.i1.pp233-241 ◽

2021 ◽

Vol 21 (1) ◽

pp. 233

Author(s):

Ahmed Hussain Aliwy ◽

Basheer Al-Sadawi

Keyword(s):

Character Recognition ◽

Optical Character Recognition ◽

Language Model ◽

Arabic Language ◽

Document Images ◽

Statistical Language Model ◽

Text Document ◽

Optical Character ◽

Arabic Ocr

<p><span>An optical character recognition (OCR) refers to a process of converting the text document images into editable and searchable text. OCR process poses several challenges in particular in the Arabic language due to it has caused a high percentage of errors. In this paper, a method, to improve the outputs of the Arabic Optical character recognition (AOCR) Systems is suggested based on a statistical language model built from the available huge corpora. This method includes detecting and correcting non-word and real words error according to the context of the word in the sentence. The results show that the percentage of improvement in the results is up to (98%) as a new accuracy for AOCR output. </span></p>

Download Full-text

Behaviour-Based Clustering of Neural Networks

Encyclopedia of Artificial Intelligence ◽

10.4018/978-1-59904-849-9.ch036 ◽

2011 ◽

pp. 231-235

Author(s):

María José Castro-Bleda ◽

Slavador España-Boquera ◽

Francisco Zamora-Martínez

Keyword(s):

Neural Network ◽

Neural Networks ◽

Character Recognition ◽

Optical Character Recognition ◽

Clustering Algorithm ◽

Training Data ◽

Document Images ◽

Supervised Classifiers ◽

Overall Performance ◽

Printed Text

The field of off-line optical character recognition (OCR) has been a topic of intensive research for many years (Bozinovic, 1989; Bunke, 2003; Plamondon, 2000; Toselli, 2004). One of the first steps in the classical architecture of a text recognizer is preprocessing, where noise reduction and normalization take place. Many systems do not require a binarization step, so the images are maintained in gray-level quality. Document enhancement not only influences the overall performance of OCR systems, but it can also significantly improve document readability for human readers. In many cases, the noise of document images is heterogeneous, and a technique fitted for one type of noise may not be valid for the overall set of documents. One possible solution to this problem is to use several filters or techniques and to provide a classifier to select the appropriate one. Neural networks have been used for document enhancement (see (Egmont-Petersen, 2002) for a review of image processing with neural networks). One advantage of neural network filters for image enhancement and denoising is that a different neural filter can be automatically trained for each type of noise. This work proposes the clustering of neural network filters to avoid having to label training data and to reduce the number of filters needed by the enhancement system. An agglomerative hierarchical clustering algorithm of supervised classifiers is proposed to do this. The technique has been applied to filter out the background noise from an office (coffee stains and footprints on documents, folded sheets with degraded printed text, etc.).

Download Full-text

Watermarking text document images using edge direction histograms

Pattern Recognition Letters ◽

10.1016/j.patrec.2004.04.002 ◽

2004 ◽

Vol 25 (11) ◽

pp. 1243-1251 ◽

Cited By ~ 43

Author(s):

Young-Won Kim ◽

Il-Seok Oh

Keyword(s):

Document Images ◽

Edge Direction ◽

Text Document

Download Full-text

Semi-fragile Watermarking For Text Document Images Authentication

2005 IEEE International Symposium on Circuits and Systems ◽

10.1109/iscas.2005.1465508 ◽

2005 ◽

Cited By ~ 3

Author(s):

Huijuan Yang ◽

A.C. Kot ◽

Jun Liu

Keyword(s):

Fragile Watermarking ◽

Document Images ◽

Text Document

Download Full-text

Handwritten and machine printed text separation from Kannada document images

2016 10th International Conference on Intelligent Systems and Control (ISCO) ◽

10.1109/isco.2016.7727051 ◽

2016 ◽

Cited By ~ 2

Author(s):

Rajmohan Pardeshi ◽

Mallikarjun Hangarge ◽

Srikanth Doddamani ◽

K.C. Santosh

Keyword(s):

Document Images ◽

Printed Text

Download Full-text

A novel technique for estimation of skew in binary text document images based on linear regression analysis

Sadhana ◽

10.1007/bf02710080 ◽

2005 ◽

Vol 30 (1) ◽

pp. 69-85 ◽

Cited By ~ 3

Author(s):

P. Shivakumara ◽

G. Hemantha Kumar ◽

D. S. Guru ◽

P. Nagabhushan

Keyword(s):

Regression Analysis ◽

Linear Regression ◽

Linear Regression Analysis ◽

Document Images ◽

Novel Technique ◽

Text Document

Download Full-text