Pathway information extracted from 25 years of pathway figures

Abstract Thousands of pathway diagrams are published each year as static figures inaccessible to computational queries and analyses. Using a combination of machine learning, optical character recognition, and manual curation, we identified 64,643 pathway figures published between 1995 and 2019 and extracted 1,112,551 instances of human genes, comprising 13,464 unique NCBI genes, participating in a wide variety of biological processes. This collection represents an order of magnitude more genes than found in the text of the same papers, and thousands of genes missing from other pathway databases, thus presenting new opportunities for discovery and research.

Download Full-text

Developing Automated Optical Character Recognition System Using Machine Learning Algorithm to Solve Payment Verification Issues

10.1109/icoris52787.2021.9649514 ◽

2021 ◽

Author(s):

Michael Siek ◽

Rafi Soeharto

Keyword(s):

Machine Learning ◽

Character Recognition ◽

Optical Character Recognition ◽

Learning Algorithm ◽

Recognition System ◽

Machine Learning Algorithm ◽

Optical Character

Download Full-text

Image Spam Detection Using Machine Learning and Natural Language Processing

Journal of Southwest Jiaotong University ◽

10.35741/issn.0258-2724.55.2.41 ◽

2020 ◽

Vol 55 (2) ◽

Author(s):

Yaseen Khather Yaseen ◽

Alaa Khudhair Abbas ◽

Ahmed M. Sana

Keyword(s):

Machine Learning ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Character Recognition ◽

Optical Character Recognition ◽

Learning Algorithm ◽

Machine Learning Algorithm ◽

Optical Character ◽

Harmful Content

Today, images are a part of communication between people. However, images are being used to share information by hiding and embedding messages within it, and images that are received through social media or emails can contain harmful content that users are not able to see and therefore not aware of. This paper presents a model for detecting spam on images. The model is a combination of optical character recognition, natural language processing, and the machine learning algorithm. Optical character recognition extracts the text from images, and natural language processing uses linguistics capabilities to detect and classify the language, to distinguish between normal text and slang language. The features for selected images are then extracted using the bag-of-words model, and the machine learning algorithm is run to detect any kind of spam that may be on it. Finally, the model can predict whether or not the image contains any harmful content. The results show that the proposed method using a combination of the machine learning algorithm, optical character recognition, and natural language processing provides high detection accuracy compared to using machine learning alone.

Download Full-text

OCR Text Extraction

International Journal of Engineering and Management Research ◽

10.31033/ijemr.11.2.11 ◽

2021 ◽

Vol 11 (2) ◽

pp. 83-86

Author(s):

Alan Jiju ◽

Shaun Tuscano ◽

Chetana Badgujar

Keyword(s):

Machine Learning ◽

Statistical Analysis ◽

Character Recognition ◽

Optical Character Recognition ◽

Text Segmentation ◽

Similar Data ◽

Written Text ◽

Amount Of Information ◽

Optical Character ◽

Intermediate Image

This research tries to find out a methodology through which any data from the daily-use printed bills and invoices can be extracted. The data from these bills or invoices can be used extensively later on – such as machine learning or statistical analysis. This research focuses on extraction of final bill-amount, itinerary, date and similar data from bills and invoices as they encapsulate an ample amount of information about the users purchases, likes or dislikes etc. Optical Character Recognition (OCR) technology is a system that provides a full alphanumeric recognition of printed or handwritten characters from images. Initially, OpenCV has been used to detect the bill or invoice from the image and filter out the unnecessary noise from the image. Then intermediate image is passed for further processing using Tesseract OCR engine, which is an optical character recognition engine. Tesseract intends to apply Text Segmentation in order to extract written text in various fonts and languages. Our methodology proves to be highly accurate while tested on a variety of input images of bills and invoices.

Download Full-text

An Useful Review on Optical Character Recognition for Smart Era Generation

Multimedia and Sensory Input for Augmented, Mixed, and Virtual Reality - Advances in Computational Intelligence and Robotics ◽

10.4018/978-1-7998-4703-8.ch001 ◽

2021 ◽

pp. 1-41

Author(s):

Abhishek Das ◽

Mihir Narayan Mohanty

Keyword(s):

Neural Network ◽

Machine Learning ◽

Image Processing ◽

Character Recognition ◽

Optical Character Recognition ◽

Deep Neural Network ◽

Machine Learning Method ◽

Learning Method ◽

Result Section ◽

Optical Character

In this chapter, the authors have reviewed on optical character recognition. The study belongs to both typed characters and handwritten character recognition. Online and offline character recognition are two modes of data acquisition in the field of OCR and are also studied. As deep learning is the emerging machine learning method in the field of image processing, the authors have described the method and its application of earlier works. From the study of the recurrent neural network (RNN), a special class of deep neural network is proposed for the recognition purpose. Further, convolutional neural network (CNN) is combined with RNN to check its performance. For this piece of work, Odia numerals and characters are taken as input and well recognized. The efficacy of the proposed method is explained in the result section.

Download Full-text

Real-Time Traffic Sign Detection and Classification Using Machine Learning and Optical Character Recognition

2020 IEEE International Conference on Electro Information Technology (EIT) ◽

10.1109/eit48999.2020.9208309 ◽

2020 ◽

Author(s):

Victor Ciuntu ◽

Hasan Ferdowsi

Keyword(s):

Machine Learning ◽

Real Time ◽

Character Recognition ◽

Optical Character Recognition ◽

Traffic Sign ◽

Real Time Traffic ◽

Optical Character ◽

Sign Detection ◽

Traffic Sign Detection

Download Full-text

Improve OCR Accuracy with Advanced Image Preprocessing using Machine Learning with Python

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.g5745.059720 ◽

2020 ◽

Vol 9 (7) ◽

pp. 1026-1030

Keyword(s):

Artificial Intelligence ◽

Machine Learning ◽

Neural Networks ◽

Character Recognition ◽

Optical Character Recognition ◽

Image Preprocessing ◽

Optical Character ◽

Handwritten Text ◽

Printed Text ◽

Learning Machine

Optical Character Recognition or Optical Character Reader (OCR) is a pattern-based method consciousness that transforms the concept of electronic conversion of images of handwritten text or printed text in a text compiled. Equipment or tools used for that purpose are cameras and apartment scanners. Handwritten text is scanned using a scanner. The image of the scrutinized document is processed using the program. Identification of manuscripts is difficult compared to other western language texts. In our proposed work we will accept the challenge of identifying letters and letters and working to achieve the same. Image Preprocessing techniques can effectively improve the accuracy of an OCR engine. The goal is to design and implement a machine with a learning machine and Python that is best to work with more accurate than OCR's pre-built machines with unique technologies such as MatLab, Artificial Intelligence, Neural networks, etc.

Download Full-text

25 Years of Pathway Figures

10.1101/2020.05.29.124503 ◽

2020 ◽

Cited By ~ 2

Author(s):

Kristina Hanspers ◽

Anders Riutta ◽

Martina Kutmon ◽

Alexander R Pico

Keyword(s):

Character Recognition ◽

Optical Character Recognition ◽

Human Gene ◽

Scientific Discovery ◽

Enrichment Analysis ◽

Gene Content ◽

Hippo Signaling ◽

Human Genes ◽

Image Query ◽

Pathway Diagrams

Background: Pathway diagrams are fundamental tools for describing biological processes in all aspects of science, including training, generating hypotheses, describing new knowledge and ultimately as communication tools in published work. Thousands of pathway diagrams are published each year as figures in papers. But as static images the pathway knowledge represented in figures is not accessible to researchers for computational queries and analyses. In this study, we aimed to identify pathway figures published in the past 25 years, to characterize the human gene content in figures by optical character recognition, and to describe their utility as a resource for pathway knowledge. Approach: To identify pathway figures representing 25 years of published research, we trained a machine learning service on manually-classified figures and applied it to 235,081 image query results from PubMed Central. Our previously described pipeline was utilized to extract human genes from the pathway figure images. These figures were characterized in terms of their parent papers, human gene content and enriched disease terms. Diverse use cases were explored for this newly accessible pathway resource. Results: We identified 64,643 pathway figures published between 1995 and 2019, depicting 1,112,551 instances of human genes (13,464 unique NCBI Genes) in various interactions and contexts. This represents more genes than found in the text of the same papers, as well as genes not found in any pathway database. We developed an interactive web tool to explore the results from the 65k set of figures, and used this tool to explore the history of scientific discovery of the Hippo Signaling pathway. We also defined a filtered set of 32k pathway figures useful for enrichment analysis.

Download Full-text

Intelligent Short Answer Assessment using Machine Learning

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.d7889.049420 ◽

2020 ◽

Vol 9 (4) ◽

pp. 1111-1116

Keyword(s):

Machine Learning ◽

Language Processing ◽

Character Recognition ◽

Optical Character Recognition ◽

Teaching Profession ◽

Student Work ◽

Short Answer ◽

Optical Character ◽

Evaluation Time ◽

The Cost

Education is fundamental for human progress. A student is evaluated by the mark he/she scores. The evaluation of student’s work is a central aspect of the teaching profession that can affect students in significant ways. Though teachers use multiple criteria for assessing student work, it is not known if emotions are a factor in their grading decisions. Also, there are several mistakes that occur on the department's side like totaling error, marking mistakes. So, we are developing software to automate the evaluation of answers using Natural Language Processing and Machine Learning. There are two modules, in the first module, we use Optical Character Recognition to extract a handwritten font from the uploaded file and the second module evaluates the answer based on various factors and the mark is awarded. For every answer being entered, evaluation is done based on the usage of word, their importance and grammatical meaning of the sentence. With this approach we can save the cost of checking the answers manually and reduce the workload of the teachers by automating the manual checking process. The evaluation time is also reduced by using this software.

Download Full-text

Exhaustive Security System Based on Face Recognition Incorporated with Number Plate Identification using Optical Character Recognition

Mehran University Research Journal of Engineering and Technology ◽

10.22581/muet1982.2001.14 ◽

2020 ◽

Vol 39 (1) ◽

pp. 145-152

Author(s):

Muhammad Yasir Zaheen ◽

Zia Mohi-u-din ◽

Ali Akber Siddique ◽

Muhammad Tahir Qadri

Keyword(s):

Machine Learning ◽

Image Processing ◽

Face Recognition ◽

Character Recognition ◽

Optical Character Recognition ◽

Recognition Algorithm ◽

Security System ◽

Security Issues ◽

Optical Character

In recent times due to rise in terrorism, people need to live in a safer place where unidentified persons will not be allowed to enter in the premises. Securing of major areas is a vital issue that needs to be addressed for the intelligence and security agencies. At the surrounding of premises, CCTV (CloseCircuit Television) cameras are usually installed to identify the number plate from database by using OCR (Optical Character Recognition) algorithm. This method of security by identifying only vehicle without verifying the person inside it is usually causing serious security issues. Identification of a person is usually done through image processing by using Viola Jones algorithm and acquire the information of the facial components to create a dataset for machine learning. It is imperative to introduce such a system that will be capable to identify the person along with the number plate of vehicle from the stored database. In this research, a comprehensive security system based on face recognition integrated with the vehicle number plate is proposed. The combined information of both dedicated cameras is then transferred to the based station for identification. This system is capable, of securing premises from crime in a more enhanced way.

Download Full-text

Machine Learning for Optical Character Recognition System

Machine Vision Inspection Systems, Volume 2 ◽

10.1002/9781119786122.ch5 ◽

2021 ◽

pp. 91-107

Author(s):

Gurwinder Kaur ◽

Tanya Garg

Keyword(s):

Machine Learning ◽

Character Recognition ◽

Optical Character Recognition ◽

Recognition System ◽

Optical Character

Download Full-text