Arabic Optical Character Recognition

Arabic text recognition is receiving more attentions from both Arabic and non-Arabic-speaking researchers. This chapter provides a general overview of the state-of-the-art in Arabic Optical Character Recognition (OCR) and the associated text recognition technology. It also investigates the characteristics of the Arabic language with respect to OCR and discusses related research on the different phases of text recognition including: pre-processing and text segmentation, common feature extraction techniques, classification methods and post-processing techniques. Moreover, the chapter discusses the available databases for Arabic OCR research and lists the available commercial Software. Finally, it explores the challenges related to Arabic OCR and discusses possible future trends.

Download Full-text

Corpus-based technique for improving Arabic OCR system

Indonesian Journal of Electrical Engineering and Computer Science ◽

10.11591/ijeecs.v21.i1.pp233-241 ◽

2021 ◽

Vol 21 (1) ◽

pp. 233

Author(s):

Ahmed Hussain Aliwy ◽

Basheer Al-Sadawi

Keyword(s):

Character Recognition ◽

Optical Character Recognition ◽

Language Model ◽

Arabic Language ◽

Document Images ◽

Statistical Language Model ◽

Text Document ◽

Optical Character ◽

Arabic Ocr

<p><span>An optical character recognition (OCR) refers to a process of converting the text document images into editable and searchable text. OCR process poses several challenges in particular in the Arabic language due to it has caused a high percentage of errors. In this paper, a method, to improve the outputs of the Arabic Optical character recognition (AOCR) Systems is suggested based on a statistical language model built from the available huge corpora. This method includes detecting and correcting non-word and real words error according to the context of the word in the sentence. The results show that the percentage of improvement in the results is up to (98%) as a new accuracy for AOCR output. </span></p>

Download Full-text

Literature Survey on Student Grade Calculation using Optical Character Recognition based Image Processing Techniques

Journal of VLSI Design and Signal Processing ◽

10.46610/jovdsp.2021.v07i01.005 ◽

2021 ◽

Vol 7 (1) ◽

pp. 34-41

Author(s):

Omkiran S G ◽

Samartha J V ◽

Shashank Nagraj Bhat ◽

Varun Gajanan Hegde ◽

Sumaiya M N

Keyword(s):

Image Processing ◽

Character Recognition ◽

Optical Character Recognition ◽

Literature Survey ◽

Image Processing Techniques ◽

Optical Character ◽

Processing Techniques

Download Full-text

SCENE TEXT RECOGNITION BY USING EE-MSER AND OPTICAL CHARACTER RECOGNITION FOR NATURAL IMAGES

International Journal of Advance Engineering and Research Development ◽

10.21090/ijaerd.021219 ◽

2015 ◽

Vol 2 (12) ◽

Keyword(s):

Character Recognition ◽

Optical Character Recognition ◽

Natural Images ◽

Text Recognition ◽

Optical Character ◽

Scene Text ◽

Scene Text Recognition

Download Full-text

Aplikasi Kalkulator Tulisan Tangan Sederhana Menggunakan Optical Character Recognition (OCR)

Applied Technology and Computing Science Journal ◽

10.33086/atcsj.v3i2.1867 ◽

2021 ◽

Vol 3 (2) ◽

pp. 103-116

Author(s):

Supriadi Supriadi

Keyword(s):

Character Recognition ◽

Optical Character Recognition ◽

Text Recognition ◽

Arithmetic Operations ◽

Written Text ◽

Optical Character ◽

Calculation Results

The calculator is a calculation tool that is widely used in various specialized fields of business and commerce. The use of a calculator makes it easier for humans to perform arithmetic operations, but there are obstacles in the process of inputting numbers if you want to calculate the value of numbers on written media such as paper, whiteboards and so on. The user must first see the text on written media, then read it and remember it then type the writing on a calculator tool or application. The drawback of this method is that when the user forgets the writing on the written media, the user will see the written text and remember it again so that it takes longer to perform calculations using a calculator. The method used in this study is Optical Character Recognition, this method can recognize text contained in images or handwritten images of mathematical number operations. The results of the text recognition will then be carried out by arithmetic calculations to get the calculation results. From the trials on 20 handwritten images of mathematical number operations, the results obtained were 85% accuracy of extraction and accuracy of handwritten images that can be calculated and correct by 85%

Download Full-text

OCR Text Extraction

International Journal of Engineering and Management Research ◽

10.31033/ijemr.11.2.11 ◽

2021 ◽

Vol 11 (2) ◽

pp. 83-86

Author(s):

Alan Jiju ◽

Shaun Tuscano ◽

Chetana Badgujar

Keyword(s):

Machine Learning ◽

Statistical Analysis ◽

Character Recognition ◽

Optical Character Recognition ◽

Text Segmentation ◽

Similar Data ◽

Written Text ◽

Amount Of Information ◽

Optical Character ◽

Intermediate Image

This research tries to find out a methodology through which any data from the daily-use printed bills and invoices can be extracted. The data from these bills or invoices can be used extensively later on – such as machine learning or statistical analysis. This research focuses on extraction of final bill-amount, itinerary, date and similar data from bills and invoices as they encapsulate an ample amount of information about the users purchases, likes or dislikes etc. Optical Character Recognition (OCR) technology is a system that provides a full alphanumeric recognition of printed or handwritten characters from images. Initially, OpenCV has been used to detect the bill or invoice from the image and filter out the unnecessary noise from the image. Then intermediate image is passed for further processing using Tesseract OCR engine, which is an optical character recognition engine. Tesseract intends to apply Text Segmentation in order to extract written text in various fonts and languages. Our methodology proves to be highly accurate while tested on a variety of input images of bills and invoices.

Download Full-text

Research on Deep Learning Techniques in Breaking Text-Based Captchas and Designing Image-Based Captcha

International Journal of Advanced Research in Science, Communication and Technology ◽

10.48175/ijarsct-900 ◽

2021 ◽

pp. 266-269

Author(s):

Janarthanan A ◽

Pandiyarajan C ◽

Sabarinathan M ◽

Sudhan M ◽

Kala R

Keyword(s):

Deep Learning ◽

Image Classification ◽

Character Recognition ◽

Optical Character Recognition ◽

Experimental Results ◽

Text Recognition ◽

Image Resizing ◽

Optical Character ◽

Learning Techniques ◽

Text Images

Optical character recognition (OCR) is a process of text recognition in images (one word). The input images are taken from the dataset. The collected text images are implemented to pre-processing. In pre-processing, we can implement the image resize process. Image resizing is necessary when you need to increase or decrease the total number of pixels, whereas remapping can occur when you are zooming refers to increase the quantity of pixels, so that when you zoom an image, you will see clear content. After that, we can implement the segmentation process. In segmentation, we can segment the each characters in one word. We can extract the features values from the image that means test feature. In classification process, we have to classify the text from the image. Image classification is performed the images in order to identify which image contains text. A classifier is used to identify the image containing text. The experimental results shows that the accuracy.

Download Full-text

Improving post-processing optical character recognition documents with Arabic language using spelling error detection and correction

International Journal of Reasoning-based Intelligent Systems ◽

10.1504/ijris.2016.082957 ◽

2016 ◽

Vol 8 (3/4) ◽

pp. 91

Author(s):

Iyad Abu Doush ◽

Ahmed M. Al Trad

Keyword(s):

Error Detection ◽

Character Recognition ◽

Optical Character Recognition ◽

Arabic Language ◽

Spelling Error ◽

Post Processing ◽

Optical Character ◽

Error Detection And Correction

Download Full-text

Improving Optical Character Recognition Techniques

International Journal of Engineering & Technology ◽

10.14419/ijet.v7i2.24.12085 ◽

2018 ◽

Vol 7 (2.24) ◽

pp. 361 ◽

Cited By ~ 1

Author(s):

Nitin Ramesh ◽

Aksha Srivastava ◽

K Deeba

Keyword(s):

Character Recognition ◽

Optical Character Recognition ◽

Document Image ◽

Text Recognition ◽

Digital Form ◽

Written Text ◽

Optical Character ◽

Research Organizations ◽

The World

Document text recognition uses a concept called OCR (optical character recognition),which is the recognition of printed or written text characters by a computer. This involves scanning a document containing text, and converting character by character to their digital form. Thus, it is defined as the process of digitizing a document image into its constituent characters. Equipment used to obtain clearer images for analysis are cameras and flatbed scanners. Even though it’s been out in the world since 1870, the OCR technology is yet to reach perfection. This demanding nature of Optical Character Recognition has made various researchers, industries and technology enthusiasts to divulge their attention to this field. In recent times one can notice a significant increase in the number of research organizations investing their time and effort in this field. In this research, the progress, different aspects and various issues revolving in this field have been summarized. The aim is to present a scrupulous overview of various proposals, advancements and discussions aimed at resolving various problems that arise in traditional OCR.

Download Full-text

OPTICAL CHARACTER RECOGNITION FOR ELECTRONIC INVOICES USING AWS SERVICES

International Journal of Engineering Applied Sciences and Technology ◽

10.33564/ijeast.2021.v06i05.036 ◽

2021 ◽

Vol 6 (5) ◽

Author(s):

Sameer M. Patel ◽

Sarvesh S. Pai ◽

Mittal B. Jain ◽

Vaibhav P. Vasani

Keyword(s):

Character Recognition ◽

Web Application ◽

Optical Character Recognition ◽

Credit Cards ◽

Text Recognition ◽

Service Architecture ◽

The Past ◽

Optical Character ◽

Handwritten Text

Optical Character Recognition is basically the mechanical or electronic conversion of printed or handwritten text into machine understandable text. The complication of Optical Character Recognition in different conditions remains as relevant as it was in the past few years. At the present time of automation and innovations, Keyboarding remains the most common way of inputting or feeding data into computers. This is probably the most time consuming and labor-intensive operation in the industry. Automating the process of recognition of documents, credit cards, electronic invoices, and license plates of cars – all of this could help in saving time for analyzing and processing data. With the increased research and development of machine learning, the quality of text recognition is continuously growing better. Our paper is focused on providing a brief explanation of the different stages involved in the process of optical character recognition and through the proposed application; we aim to automate the process of extraction of important texts from electronic invoices. The main goal of the project is to develop a real time OCR web application with a micro service architecture, which would help in extracting necessary information from an invoice.

Download Full-text

Embedded system design to control the entry and exit of vehicles online, at the main access of ESPOCH

Journal of Science and Research Revista Ciencia e Investigación ◽

10.26910/issn.2528-8083vol3isscitt2017.2018pp113-120 ◽

2018 ◽

Vol 3 (CITT2017) ◽

pp. 113-120

Author(s):

Javier J. Gavilanes ◽

Jairo R. Jácome ◽

Alexandra O. Pazmiño

Keyword(s):

Embedded System ◽

Character Recognition ◽

Optical Character Recognition ◽

Entry And Exit ◽

Real Time System ◽

Optical Character ◽

Nearest Neighbours ◽

Processing Techniques ◽

Python Programming ◽

Main Entrance

In this research a embedded real-time system was developed by using Raspberry Pi3 (a reduced board computer), which is an equipment with a camera placed in strategic points of the mechanic arms at the main entrance and exit of Escuela Superior Politécnica de Chimborazo, this equipment captures images of vehicles that enter and exit the campus and the information is extracted through the implementation of a segmentation algorithm written in Python programming language and the collaboration of artificial vision bookstores offered by OpenCV, processing techniques were applied to extract the vehicle plate from the location scenery. Then, an Optical Character Recognition (OCR) algorithm also known as K-Nearest Neighbours (KNN) was applied, which after a training phase is able to identify letters and numbers on the automobile plates, the information is stored in the entrance database and it is deleted when the automobile exits the campus.

Download Full-text