scholarly journals A Bilingual Numeral OCR System for Creating Uni-Lingual Digitized Numeral Document

2015 ◽  
Vol 9 (13) ◽  
pp. 148 ◽  
Author(s):  
Karthick K ◽  
Chitra S

<p>The optical character recognition has been used in many applications such as dictionary generation, customer billing system, banking and postal automation, and library automation etc. The bilingual OCR system to make uni-lingual script helps us to reduce the requirement of two different OCR systems into a single OCR system for recognition of two different languages. This type of globalization helps the universal users of any language can read the text documents in their self-language if the bilingual documents are converted into uni-lingual document. In this paper, the image which contains printed Tamil and European numerals has been recognized using common OCR System and the Tamil numerals are converted into European numerals to globalize the document from a bilingual script into a uni-lingual document. The main objective of the work is to bring out the single numeral (European numerals) text document from the input image with two different numerals (Tamil and European Numerals). The Kohonen’s self-organizing map (SOM) based recognition system has been used for recognizing the numerals and recognized characters in bilingual numerals (Tamil and European Numerals) form are converted into Uni-lingual form (European numerals). This paper also discusses the various approaches used for OCR.</p>

Author(s):  
M A Mikheev ◽  
P Y Yakimov

The article is devoted to solving the problem of document versions comparison in electronic document management systems. Systems-analogues were considered, the process of comparing text documents was studied. In order to recognize the text on the scanned image, the technology of optical character recognition and its implementation — Tesseract library were chosen. The Myers algorithm is applied to compare received texts. The software implementation of the text document comparison module was implemented using the solutions described above.


1999 ◽  
Vol 09 (06) ◽  
pp. 545-561 ◽  
Author(s):  
HSIN-CHIA FU ◽  
Y. Y. XU ◽  
H. Y. CHANG

Recognition of similar (confusion) characters is a difficult problem in optical character recognition (OCR). In this paper, we introduce a neural network solution that is capable of modeling minor differences among similar characters, and is robust to various personal handwriting styles. The Self-growing Probabilistic Decision-based Neural Network (SPDNN) is a probabilistic type neural network, which adopts a hierarchical network structure with nonlinear basis functions and a competitive credit-assignment scheme. Based on the SPDNN model, we have constructed a three-stage recognition system. First, a coarse classifier determines a character to be input to one of the pre-defined subclasses partitioned from a large character set, such as Chinese mixed with alphanumerics. Then a character recognizer determines the input image which best matches the reference character in the subclass. Lastly, the third module is a similar character recognizer, which can further enhance the recognition accuracy among similar or confusing characters. The prototype system has demonstrated a successful application of SPDNN to similar handwritten Chinese recognition for the public database CCL/HCCR1 (5401 characters × 200 samples). Regarding performance, experiments on the CCL/HCCR1 database produced 90.12% recognition accuracy with no rejection, and 94.11% accuracy with 6.7% rejection, respectively. This recognition accuracy represents about 4% improvement on the previously announced performance.5,11 As to processing speed, processing before recognition (including image preprocessing, segmentation, and feature extraction) requires about one second for an A4 size character image, and recognition consumes approximately 0.27 second per character on a Pentium-100 based personal computer, without use of any hardware accelerator or co-processor.


Segmentation is division of something into smaller parts and one of the Component of character recognition system. Separation of characters, words and lines are done in Segmentation from text documents. character recognition is a process which allows computers to recognize written or printed characters such as numbers or letters and to change them into a form that the computer can use. the accuracy of OCR system is done by taking the output of an OCR run for an image and comparing it to the original version of the same text. The main aim of this paper is to find out the various text line segmentations are Projection profiles, Weighted Bucket Method. Proposed method is horizontal projection profile and connected component method on Handwritten Kannada language. These methods are used for experimentation and finally comparing their accuracy and results.


Sign in / Sign up

Export Citation Format

Share Document