Polarity Identification for Handwritten Text in Multilingual Documents Using Open Source Optical Character Recognition Tools

Sentimental analysis is a prerequisite for many applications. We propose a model which scans handwritten text in English and Kannada languages by a CamScanner and then translated into editable text by using various Open Source Optical Character Recognition tools. The performances of different OCRs are analyzed and tabulated. Sentimental analysis is performed on the statements written in both English and Kannada languages using Wordnet, Algorithmia Rest API and local dictionaries and we have obtained the satisfied results. The same sentimental analysis module is also applied on customer reviews for the mobile product and reviews are taken from Amazon Web Services. The opinion of the customer about the product can be identified correctly.

Download Full-text

How to Improve Optical Character Recognition of Historical Finnish Newspapers Using Open Source Tesseract OCR Engine – Final Notes on Development and Evaluation

Human Language Technology. Challenges for Computer Science and Linguistics - Lecture Notes in Computer Science ◽

10.1007/978-3-030-66527-2_2 ◽

2020 ◽

pp. 17-30

Author(s):

Mika Koistinen ◽

Kimmo Kettunen ◽

Jukka Kervinen

Keyword(s):

Open Source ◽

Character Recognition ◽

Optical Character Recognition ◽

Optical Character

Download Full-text

Optical Character Recognition for English Handwritten Text Using Recurrent Neural Network

2020 International Conference on System, Computation, Automation and Networking (ICSCAN) ◽

10.1109/icscan49426.2020.9262379 ◽

2020 ◽

Author(s):

R. Parthiban ◽

R. Ezhilarasi ◽

D. Saravanan

Keyword(s):

Neural Network ◽

Recurrent Neural Network ◽

Character Recognition ◽

Optical Character Recognition ◽

Optical Character ◽

Handwritten Text

Download Full-text

Performance Analysis of Open Source Optical Character Recognition

Journal of Computational and Theoretical Nanoscience ◽

10.1166/jctn.2020.9060 ◽

2020 ◽

Vol 17 (9) ◽

pp. 4267-4275

Author(s):

Jagadish Kallimani ◽

Chandrika Prasad ◽

D. Keerthana ◽

Manoj J. Shet ◽

Prasada Hegde ◽

...

Keyword(s):

Performance Analysis ◽

Comparative Study ◽

Open Source ◽

Error Rate ◽

Character Recognition ◽

Optical Character Recognition ◽

Accurate Result ◽

Recognition System ◽

Optical Character ◽

Recognition Systems

Optical character recognition is the process of conversion of images of text into machine-encoded text electronically or mechanically. The text on image can be handwritten, typed or printed. Some of the examples of image source can be a picture of a document, a scanned document or a text which is superimposed on an image. Most optical character recognition system does not give a 100% accurate result. This project aims at analyzing the error rate of a few open source optical character recognition systems (Boxoft OCR, ABBY, Tesseract, Free Online OCR etc.) on a set of diverse documents and makes a comparative study of the same. By this, we can study which OCR is the best suited for a document.

Download Full-text

Open source optical character recognition for historical research

Journal of Documentation ◽

10.1108/00220411211256021 ◽

2012 ◽

Vol 68 (5) ◽

pp. 659-683 ◽

Cited By ~ 3

Author(s):

Tobias Blanke ◽

Michael Bryant ◽

Mark Hedges

Keyword(s):

Open Source ◽

Character Recognition ◽

Optical Character Recognition ◽

Historical Research ◽

Optical Character

Download Full-text

Character Segmentation and Skew Correction for Handwritten Devanagari Scripts: A Friends Technique

Asian Journal of Engineering and Applied Technology ◽

10.51983/ajeat-2019.8.1.1060 ◽

2019 ◽

Vol 8 (1) ◽

pp. 50-54

Author(s):

Ashok Kumar Bathla . ◽

Sunil Kumar Gupta .

Keyword(s):

Human Brain ◽

Character Recognition ◽

Optical Character Recognition ◽

Character Segmentation ◽

Novel Technique ◽

Skew Correction ◽

Optical Character ◽

Handwritten Text ◽

Scripting Language ◽

The Way

Optical Character Recognition (OCR) technology allows a computer to “read” text (both typed and handwritten) the way a human brain does.Significant research efforts have been put in the area of Optical Character Segmentation (OCR) of typewritten text in various languages, however very few efforts have been put on the segmentation and skew correction of handwritten text written in Devanagari which is a scripting language of Hindi. This paper aims a novel technique for segmentation and skew correction of hand written Devanagari text. It shows the accuracy of 91% and takes less than one second to segment a particular handwritten word.

Download Full-text

Classification of printed and handwritten text using hybrid techniques for gurumukhi script

International Journal Of Engineering And Computer Science ◽

10.18535/ijecs/v8i04.4298 ◽

2019 ◽

Vol 8 (04) ◽

pp. 24586-24602

Author(s):

Manpreet Kaur ◽

Balwinder Singh

Keyword(s):

Character Recognition ◽

Optical Character Recognition ◽

Classification Problem ◽

Hybrid Techniques ◽

Optical Character ◽

Handwritten Text ◽

Scanned Images ◽

Character Classification ◽

Incorrect Classification

Text classification is a crucial step for optical character recognition. The output of the scanner is non- editable. Though one cannot make any change in scanned text image, if required. Thus, this provides the feed for the theory of optical character recognition. Optical Character Recognition (OCR) is the process of converting scanned images of machine printed or handwritten text into a computer readable format. The process of OCR involves several steps including pre-processing after image acquisition, segmentation, feature extraction, and classification. The incorrect classification is like a garbage in and garbage out. Existing methods focuses only upon the classification of unmixed characters in Arab, English, Latin, Farsi, Bangla, and Devnagari script. The Hybrid Techniques is solving the mixed (Machine printed and handwritten) character classification problem. Classification is carried out on different kind of daily use forms like as self declaration forms, admission forms, verification forms, university forms, certificates, banking forms, dairy forms, Punjab govt forms etc. The proposed technique is capable to classify the handwritten and machine printed text written in Gurumukhi script in mixed text. The proposed technique has been tested on 150 different kinds of forms in Gurumukhi and Roman scripts. The proposed techniques achieve 93% accuracy on mixed character form and 96% accuracy achieves on unmixed character forms. The overall accuracy of the proposed technique is 94.5%.

Download Full-text

Towards Syriac Digital Corpora: Evaluation of Tesseract 4.0 for Syriac OCR

Hugoye: Journal of Syriac Studies ◽

10.31826/hug-2019-220105 ◽

2019 ◽

Vol 22 (1) ◽

pp. 109-192

Author(s):

Emily Chesley ◽

Jillian Marcantonio ◽

Abigail Pearson

Keyword(s):

Open Source ◽

Character Recognition ◽

Optical Character Recognition ◽

Further Training ◽

Current State ◽

Optical Character ◽

Extensive Test ◽

Degree Of Confidence ◽

High Degree

Abstract This paper summarizes the results of an extensive test of Tesseract 4.0, an open-source Optical Character Recognition (OCR) engine with Syriac capabilities, and ascertains the current state of Syriac OCR technology. Three popular print types (S14, W64, and E22) representing the Syriac type styles Estrangela, Serto, and East Syriac were OCRed using Tesseract’s two different OCR modes (Syriac Language and Syriac Script). Handwritten manuscripts were also preliminarily tested for OCR. The tests confirm that Tesseract 4.0 may be relied upon for printed Estrangela texts but should be used with caution and human revision for Serto and East Syriac printed texts. Consonantal accuracy lies around 99% for Estrangela, between 89% and 94% for Serto, and around 89% for East Syriac. Scholars may use Tesseract to OCR Estrangela texts with a high degree of confidence, but further training of the engine will be required before Serto and East Syriac texts can be smoothly OCRed. In all type styles, human revision of the OCRed text is recommended when scholars desire an exact, error-free corpus.

Download Full-text

OPTICAL CHARACTER RECOGNITION FOR ELECTRONIC INVOICES USING AWS SERVICES

International Journal of Engineering Applied Sciences and Technology ◽

10.33564/ijeast.2021.v06i05.036 ◽

2021 ◽

Vol 6 (5) ◽

Author(s):

Sameer M. Patel ◽

Sarvesh S. Pai ◽

Mittal B. Jain ◽

Vaibhav P. Vasani

Keyword(s):

Character Recognition ◽

Web Application ◽

Optical Character Recognition ◽

Credit Cards ◽

Text Recognition ◽

Service Architecture ◽

The Past ◽

Optical Character ◽

Handwritten Text

Optical Character Recognition is basically the mechanical or electronic conversion of printed or handwritten text into machine understandable text. The complication of Optical Character Recognition in different conditions remains as relevant as it was in the past few years. At the present time of automation and innovations, Keyboarding remains the most common way of inputting or feeding data into computers. This is probably the most time consuming and labor-intensive operation in the industry. Automating the process of recognition of documents, credit cards, electronic invoices, and license plates of cars – all of this could help in saving time for analyzing and processing data. With the increased research and development of machine learning, the quality of text recognition is continuously growing better. Our paper is focused on providing a brief explanation of the different stages involved in the process of optical character recognition and through the proposed application; we aim to automate the process of extraction of important texts from electronic invoices. The main goal of the project is to develop a real time OCR web application with a micro service architecture, which would help in extracting necessary information from an invoice.

Download Full-text

Improve OCR Accuracy with Advanced Image Preprocessing using Machine Learning with Python

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.g5745.059720 ◽

2020 ◽

Vol 9 (7) ◽

pp. 1026-1030

Keyword(s):

Artificial Intelligence ◽

Machine Learning ◽

Neural Networks ◽

Character Recognition ◽

Optical Character Recognition ◽

Image Preprocessing ◽

Optical Character ◽

Handwritten Text ◽

Printed Text ◽

Learning Machine

Optical Character Recognition or Optical Character Reader (OCR) is a pattern-based method consciousness that transforms the concept of electronic conversion of images of handwritten text or printed text in a text compiled. Equipment or tools used for that purpose are cameras and apartment scanners. Handwritten text is scanned using a scanner. The image of the scrutinized document is processed using the program. Identification of manuscripts is difficult compared to other western language texts. In our proposed work we will accept the challenge of identifying letters and letters and working to achieve the same. Image Preprocessing techniques can effectively improve the accuracy of an OCR engine. The goal is to design and implement a machine with a learning machine and Python that is best to work with more accurate than OCR's pre-built machines with unique technologies such as MatLab, Artificial Intelligence, Neural networks, etc.

Download Full-text

Machine Replication of Human Perusing using Optical Character Recognition with Tesseract

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.b1079.1292s419 ◽

2019 ◽

Vol 9 (2S4) ◽

pp. 74-77

Keyword(s):

Open Source ◽

Character Recognition ◽

Optical Character Recognition ◽

Optical Character

Optical Character Recognition is the machine replication of human perusing. Electronic Conversion of examined pictures where picture can be type composed or printed content. It is executed utilizing Google's open source Optical Character Recognition programming called Tesseract. The OCR accepts picture as the information, gets content from that picture and afterward changes over it into whatever other language that the client needed. This framework can be helpful in different applications like banking, legitimate industry, explorers’ different ventures, and home and office robotization. It for the most part intended for individuals who are unfit to peruse any sort of content archives and to diminish the weight of information passage occupations.[4]

Download Full-text