scholarly journals Multi-Font Arabic Isolated Character Recognition Using Combining Machine Learning Classifiers

Author(s):  
Raidah S. Khudeyer ◽  
Maytham Alabbas ◽  
Mustafa Radif

Nowadays, optical character recognition is one of the most successful automatic pattern recognition applications. Many works have been done regarding the identification of Latin and Chinese characters. However, the reason for having few investigations for the recognition of Arabic characters is the complexity and difficulty of Arabic characters identification compared to the others. In the current work, we investigate combining multiple machine learning algorithms for multi-font Arabic isolated characters recognition, where imperfect and dimensionally variable input charactersare faced. To the best of our knowledge, there is no such work yet available in this regard. Experimental results show that combined multiple classifiers can outperform each individual classifier produces by itself. The current findings are encouraging and opens the door for further research tasks in this direction.

Author(s):  
Veronica Ong ◽  
Derwin Suhartono

The growth in computer vision technology has aided society with various kinds of tasks. One of these tasks is the ability of recognizing text contained in an image, or usually referred to as Optical Character Recognition (OCR). There are many kinds of algorithms that can be implemented into an OCR. The K-Nearest Neighbor is one such algorithm. This research aims to find out the process behind the OCR mechanism by using K-Nearest Neighbor algorithm; one of the most influential machine learning algorithms. It also aims to find out how precise the algorithm is in an OCR program. To do that, a simple OCR program to classify alphabets of capital letters is made to produce and compare real results. The result of this research yielded a maximum of 76.9% accuracy with 200 training samples per alphabet. A set of reasons are also given as to why the program is able to reach said level of accuracy.


Author(s):  
Yaseen Khather Yaseen ◽  
Alaa Khudhair Abbas ◽  
Ahmed M. Sana

Today, images are a part of communication between people. However, images are being used to share information by hiding and embedding messages within it, and images that are received through social media or emails can contain harmful content that users are not able to see and therefore not aware of. This paper presents a model for detecting spam on images. The model is a combination of optical character recognition, natural language processing, and the machine learning algorithm. Optical character recognition extracts the text from images, and natural language processing uses linguistics capabilities to detect and classify the language, to distinguish between normal text and slang language. The features for selected images are then extracted using the bag-of-words model, and the machine learning algorithm is run to detect any kind of spam that may be on it. Finally, the model can predict whether or not the image contains any harmful content. The results show that the proposed method using a combination of the machine learning algorithm, optical character recognition, and natural language processing provides high detection accuracy compared to using machine learning alone.


2021 ◽  
Vol 11 (2) ◽  
pp. 83-86
Author(s):  
Alan Jiju ◽  
Shaun Tuscano ◽  
Chetana Badgujar

This research tries to find out a methodology through which any data from the daily-use printed bills and invoices can be extracted. The data from these bills or invoices can be used extensively later on – such as machine learning or statistical analysis. This research focuses on extraction of final bill-amount, itinerary, date and similar data from bills and invoices as they encapsulate an ample amount of information about the users purchases, likes or dislikes etc. Optical Character Recognition (OCR) technology is a system that provides a full alphanumeric recognition of printed or handwritten characters from images. Initially, OpenCV has been used to detect the bill or invoice from the image and filter out the unnecessary noise from the image. Then intermediate image is passed for further processing using Tesseract OCR engine, which is an optical character recognition engine. Tesseract intends to apply Text Segmentation in order to extract written text in various fonts and languages. Our methodology proves to be highly accurate while tested on a variety of input images of bills and invoices.


Author(s):  
Abhishek Das ◽  
Mihir Narayan Mohanty

In this chapter, the authors have reviewed on optical character recognition. The study belongs to both typed characters and handwritten character recognition. Online and offline character recognition are two modes of data acquisition in the field of OCR and are also studied. As deep learning is the emerging machine learning method in the field of image processing, the authors have described the method and its application of earlier works. From the study of the recurrent neural network (RNN), a special class of deep neural network is proposed for the recognition purpose. Further, convolutional neural network (CNN) is combined with RNN to check its performance. For this piece of work, Odia numerals and characters are taken as input and well recognized. The efficacy of the proposed method is explained in the result section.


Optical Character Recognition or Optical Character Reader (OCR) is a pattern-based method consciousness that transforms the concept of electronic conversion of images of handwritten text or printed text in a text compiled. Equipment or tools used for that purpose are cameras and apartment scanners. Handwritten text is scanned using a scanner. The image of the scrutinized document is processed using the program. Identification of manuscripts is difficult compared to other western language texts. In our proposed work we will accept the challenge of identifying letters and letters and working to achieve the same. Image Preprocessing techniques can effectively improve the accuracy of an OCR engine. The goal is to design and implement a machine with a learning machine and Python that is best to work with more accurate than OCR's pre-built machines with unique technologies such as MatLab, Artificial Intelligence, Neural networks, etc.


2020 ◽  
Vol 21 (1) ◽  
Author(s):  
Kristina Hanspers ◽  
Anders Riutta ◽  
Martina Summer-Kutmon ◽  
Alexander R. Pico

Abstract Thousands of pathway diagrams are published each year as static figures inaccessible to computational queries and analyses. Using a combination of machine learning, optical character recognition, and manual curation, we identified 64,643 pathway figures published between 1995 and 2019 and extracted 1,112,551 instances of human genes, comprising 13,464 unique NCBI genes, participating in a wide variety of biological processes. This collection represents an order of magnitude more genes than found in the text of the same papers, and thousands of genes missing from other pathway databases, thus presenting new opportunities for discovery and research.


Author(s):  
Hrithik Roshan Palampatla

Automatic Number Plate Recognition (ANPR) is a mass surveillance system that captures the image of vehicles and recognizes their registration number issued by government. ANPR is often used in the detection of stolen vehicles, traffic surveillance system. Our project presents a model in which the vehicle license plate image is obtained by the digital cameras and the image is processed to get the number plate information. A vehicle image is captured and processed using various methods. Vehicle number plate region is extracted using the deep neural networks. Optical character recognition is implemented using certain machine learning algorithms for the character recognition. The system is implemented using deep neural network model, machine learning algorithms and is simulated in python, and its performance is tested on real images. It is observed that the developed model successfully detects the license plate region and recognizes the individual characters. There are various recognition strategies that have been produced and number plate recognition systems are today used in different movement and security applications, such as access and border control, parking, or tracking of stolen vehicles.


Sign in / Sign up

Export Citation Format

Share Document