scholarly journals Automate Identification and Recognition of Handwritten Text from an Image

Author(s):  
Siddharth Salar Et.al

Handwritten text acknowledgment is yet an open examination issue in the area of Optical Character Recognition (OCR). This paper proposes a productive methodology towards the advancement of handwritten text acknowledgment frameworks. The primary goal of this task is to create AI calculation to empower element and information extraction from records with manually written explanations, with an, expect to distinguish transcribed words on a picture. The main aim of this project is to extract text, this text can be handwritten text or it can machine printed text and convert it into computer understandable or wNe can say computer editable format. To implement thais project we have used PyTesseract which is an open-sourcemOCR engine used to recognize handwritten text and OpenCV a library in python used to solve computer vision problems. So the input image is executed in various steps, first there is pre-processing of an image then there is text localization after that there is character segmentation and character recognition and finally we have post-processing               of image. Further image processingalgorithms can also be used to deal with the multiple characters input in a single image, tilt image, or rotated image. The prepared framework gives a normal precision of more than 95 % with the concealed test picture.

2019 ◽  
Vol 8 (1) ◽  
pp. 50-54
Author(s):  
Ashok Kumar Bathla . ◽  
Sunil Kumar Gupta .

Optical Character Recognition (OCR) technology allows a computer to “read” text (both typed and handwritten) the way a human brain does.Significant research efforts have been put in the area of Optical Character Segmentation (OCR) of typewritten text in various languages, however very few efforts have been put on the segmentation and skew correction of handwritten text written in Devanagari which is a scripting language of Hindi. This paper aims a novel technique for segmentation and skew correction of hand written Devanagari text. It shows the accuracy of 91% and takes less than one second to segment a particular handwritten word.


Optical Character Recognition or Optical Character Reader (OCR) is a pattern-based method consciousness that transforms the concept of electronic conversion of images of handwritten text or printed text in a text compiled. Equipment or tools used for that purpose are cameras and apartment scanners. Handwritten text is scanned using a scanner. The image of the scrutinized document is processed using the program. Identification of manuscripts is difficult compared to other western language texts. In our proposed work we will accept the challenge of identifying letters and letters and working to achieve the same. Image Preprocessing techniques can effectively improve the accuracy of an OCR engine. The goal is to design and implement a machine with a learning machine and Python that is best to work with more accurate than OCR's pre-built machines with unique technologies such as MatLab, Artificial Intelligence, Neural networks, etc.


Author(s):  
Ipsita Pattnaik ◽  
Tushar Patnaik

Optical Character Recognition (OCR) is a field which converts printed text into computer understandable format that is editable in nature. Odia is a regional language used in Odisha, West Bengal & Jharkhand. It is used by over forty million people and still counting. With such large dependency on a language makes it important, to preserve its script, get a digital editable version of odia script. We propose a framework that takes computer printed odia script image as an input & gives a computer readable & user editable format of same, which eventually recognizes the characters printed in input image. The system uses various techniques to improve the image & perform Line segmentation followed by word segmentation & finally character segmentation using horizontal & vertical projection profile.


The concept of digitization has marked a revolution in the area of data conversion, data storage and data sharing by converting non-editable typographic & handwritten text into editable electronic text. Though numerous such works have been carried out across the world in various languages using Optical Character Recognition (OCR), satisfactory output has been observed only in a few languages. This paper is an endeavor towards taking a step ahead in the digitization of two of the most extensively spoken languages in the Indian sub-continent – Hindi and Bengali - using Google’s open source OCR Engine, Tesseract. Working on the scripts of these two languages of Brahmi origin has its own challenges owing to their varied traits of character segmentation and word formation. Here, the training of Tesseract with data sets of Hindi and Bengali typographic and handwritten characters has been integrated with an inimitable pre-processing stage involving input image customization and image augmentation that significantly enhances the image quality allowing Tesseract to offer more accurate results, especially in cases of handwritten texts and obscure images. Besides, it also incorporates the features of English translation and text to speech translation which render their significance among the non-natives and visually impaired mass. The focal idea of this paper has been to reach out to an extended mass by enabling digitization on the Android platform. Comparative analysis carried out on three distinctive parameters - on images with typographic texts, handwritten texts and on inferior quality images - shows that the paper, to a certain extent, does succeed in projecting superior output in at least two cases as compared to the most consistent Android application of today’s time.


2013 ◽  
Vol 8 (1) ◽  
pp. 686-691
Author(s):  
Vneeta Rani ◽  
Dr.Vijay Laxmi

OCR (optical character recognition) is a technology that is commonly used for recognizing patterns artificial intelligence & computer machine. With the help of OCR we can convert scanned document into editable documents which can be further used in various research areas. In this paper, we are presenting a character segmentation technique that can segment simple characters, skewed characters as well as broken characters. Character segmentation is very important phase in any OCR process because output of this phase will be served as input to various other phase like character recognition phase etc. If there is some problem in character segmentation phase then recognition of the corresponding character is very difficult or nearly impossible.


Optical Character Recognition has been an active research area in computer science for several years. Several research works undertaken on various languages in India. In this paper an attempt has been made to find out the percentage of accuracy in word and character segmentation of Hindi (National language of India) and Odia is one of the Regional Language mostly spoken in Odisha and a few Eastern India states. A comparative article has been published under this article. 10 sets of each printed Odia and Devanagari scripts with different word limits were used in this study. The documents were scanned at 300dpi before adopting pre-processing and segmentation procedure. The result shows that the percentage of accuracy both in word and character segmentation is higher in Odia language as compared to Hindi language. One of the reasons is the use of headers line in Hindi which makes the segmentation process cumbersome. Thus, it can be concluded that the accuracy level can vary from one language to the other and from word segmentation to that of the character segmentation.


2019 ◽  
Vol 8 (04) ◽  
pp. 24586-24602
Author(s):  
Manpreet Kaur ◽  
Balwinder Singh

Text classification is a crucial step for optical character recognition. The output of the scanner is non- editable. Though one cannot make any change in scanned text image, if required. Thus, this provides the feed for the theory of optical character recognition. Optical Character Recognition (OCR) is the process of converting scanned images of machine printed or handwritten text into a computer readable format. The process of OCR involves several steps including pre-processing after image acquisition, segmentation, feature extraction, and classification. The incorrect classification is like a garbage in and garbage out. Existing methods focuses only upon the classification of unmixed characters in Arab, English, Latin, Farsi, Bangla, and Devnagari script. The Hybrid Techniques is solving the mixed (Machine printed and handwritten) character classification problem. Classification is carried out on different kind of daily use forms like as self declaration forms, admission forms, verification forms, university forms, certificates, banking forms, dairy forms, Punjab govt forms etc. The proposed technique is capable to classify the handwritten and machine printed text written in Gurumukhi script in mixed text. The proposed technique has been tested on 150 different kinds of forms in Gurumukhi and Roman scripts. The proposed techniques achieve 93% accuracy on mixed character form and 96% accuracy achieves on unmixed character forms. The overall accuracy of the proposed technique is 94.5%.


Sensors ◽  
2019 ◽  
Vol 19 (13) ◽  
pp. 3015 ◽  
Author(s):  
Farman Ullah ◽  
Hafeez Anwar ◽  
Iram Shahzadi ◽  
Ata Ur Rehman ◽  
Shizra Mehmood ◽  
...  

The paper proposes a sensors platform to control a barrier that is installed for vehicles entrance. This platform is automatized by image-based license plate recognition of the vehicle. However, in situations where standardized license plates are not used, such image-based recognition becomes non-trivial and challenging due to the variations in license plate background, fonts and deformations. The proposed method first detects the approaching vehicle via ultrasonic sensors and, at the same time, captures its image via a camera installed along with the barrier. From this image, the license plate is automatically extracted and further processed to segment the license plate characters. Finally, these characters are recognized with the help of a standard optical character recognition (OCR) pipeline. The evaluation of the proposed system shows an accuracy of 98% for license plates extraction, 96% for character segmentation and 93% for character recognition.


Sign in / Sign up

Export Citation Format

Share Document