scholarly journals Pathway information extracted from 25 years of pathway figures

2020 ◽  
Vol 21 (1) ◽  
Author(s):  
Kristina Hanspers ◽  
Anders Riutta ◽  
Martina Summer-Kutmon ◽  
Alexander R. Pico

Abstract Thousands of pathway diagrams are published each year as static figures inaccessible to computational queries and analyses. Using a combination of machine learning, optical character recognition, and manual curation, we identified 64,643 pathway figures published between 1995 and 2019 and extracted 1,112,551 instances of human genes, comprising 13,464 unique NCBI genes, participating in a wide variety of biological processes. This collection represents an order of magnitude more genes than found in the text of the same papers, and thousands of genes missing from other pathway databases, thus presenting new opportunities for discovery and research.

Author(s):  
Yaseen Khather Yaseen ◽  
Alaa Khudhair Abbas ◽  
Ahmed M. Sana

Today, images are a part of communication between people. However, images are being used to share information by hiding and embedding messages within it, and images that are received through social media or emails can contain harmful content that users are not able to see and therefore not aware of. This paper presents a model for detecting spam on images. The model is a combination of optical character recognition, natural language processing, and the machine learning algorithm. Optical character recognition extracts the text from images, and natural language processing uses linguistics capabilities to detect and classify the language, to distinguish between normal text and slang language. The features for selected images are then extracted using the bag-of-words model, and the machine learning algorithm is run to detect any kind of spam that may be on it. Finally, the model can predict whether or not the image contains any harmful content. The results show that the proposed method using a combination of the machine learning algorithm, optical character recognition, and natural language processing provides high detection accuracy compared to using machine learning alone.


2021 ◽  
Vol 11 (2) ◽  
pp. 83-86
Author(s):  
Alan Jiju ◽  
Shaun Tuscano ◽  
Chetana Badgujar

This research tries to find out a methodology through which any data from the daily-use printed bills and invoices can be extracted. The data from these bills or invoices can be used extensively later on – such as machine learning or statistical analysis. This research focuses on extraction of final bill-amount, itinerary, date and similar data from bills and invoices as they encapsulate an ample amount of information about the users purchases, likes or dislikes etc. Optical Character Recognition (OCR) technology is a system that provides a full alphanumeric recognition of printed or handwritten characters from images. Initially, OpenCV has been used to detect the bill or invoice from the image and filter out the unnecessary noise from the image. Then intermediate image is passed for further processing using Tesseract OCR engine, which is an optical character recognition engine. Tesseract intends to apply Text Segmentation in order to extract written text in various fonts and languages. Our methodology proves to be highly accurate while tested on a variety of input images of bills and invoices.


Author(s):  
Abhishek Das ◽  
Mihir Narayan Mohanty

In this chapter, the authors have reviewed on optical character recognition. The study belongs to both typed characters and handwritten character recognition. Online and offline character recognition are two modes of data acquisition in the field of OCR and are also studied. As deep learning is the emerging machine learning method in the field of image processing, the authors have described the method and its application of earlier works. From the study of the recurrent neural network (RNN), a special class of deep neural network is proposed for the recognition purpose. Further, convolutional neural network (CNN) is combined with RNN to check its performance. For this piece of work, Odia numerals and characters are taken as input and well recognized. The efficacy of the proposed method is explained in the result section.


Optical Character Recognition or Optical Character Reader (OCR) is a pattern-based method consciousness that transforms the concept of electronic conversion of images of handwritten text or printed text in a text compiled. Equipment or tools used for that purpose are cameras and apartment scanners. Handwritten text is scanned using a scanner. The image of the scrutinized document is processed using the program. Identification of manuscripts is difficult compared to other western language texts. In our proposed work we will accept the challenge of identifying letters and letters and working to achieve the same. Image Preprocessing techniques can effectively improve the accuracy of an OCR engine. The goal is to design and implement a machine with a learning machine and Python that is best to work with more accurate than OCR's pre-built machines with unique technologies such as MatLab, Artificial Intelligence, Neural networks, etc.


Author(s):  
Kristina Hanspers ◽  
Anders Riutta ◽  
Martina Kutmon ◽  
Alexander R Pico

Background: Pathway diagrams are fundamental tools for describing biological processes in all aspects of science, including training, generating hypotheses, describing new knowledge and ultimately as communication tools in published work. Thousands of pathway diagrams are published each year as figures in papers. But as static images the pathway knowledge represented in figures is not accessible to researchers for computational queries and analyses. In this study, we aimed to identify pathway figures published in the past 25 years, to characterize the human gene content in figures by optical character recognition, and to describe their utility as a resource for pathway knowledge. Approach: To identify pathway figures representing 25 years of published research, we trained a machine learning service on manually-classified figures and applied it to 235,081 image query results from PubMed Central. Our previously described pipeline was utilized to extract human genes from the pathway figure images. These figures were characterized in terms of their parent papers, human gene content and enriched disease terms. Diverse use cases were explored for this newly accessible pathway resource. Results: We identified 64,643 pathway figures published between 1995 and 2019, depicting 1,112,551 instances of human genes (13,464 unique NCBI Genes) in various interactions and contexts. This represents more genes than found in the text of the same papers, as well as genes not found in any pathway database. We developed an interactive web tool to explore the results from the 65k set of figures, and used this tool to explore the history of scientific discovery of the Hippo Signaling pathway. We also defined a filtered set of 32k pathway figures useful for enrichment analysis.


Education is fundamental for human progress. A student is evaluated by the mark he/she scores. The evaluation of student’s work is a central aspect of the teaching profession that can affect students in significant ways. Though teachers use multiple criteria for assessing student work, it is not known if emotions are a factor in their grading decisions. Also, there are several mistakes that occur on the department's side like totaling error, marking mistakes. So, we are developing software to automate the evaluation of answers using Natural Language Processing and Machine Learning. There are two modules, in the first module, we use Optical Character Recognition to extract a handwritten font from the uploaded file and the second module evaluates the answer based on various factors and the mark is awarded. For every answer being entered, evaluation is done based on the usage of word, their importance and grammatical meaning of the sentence. With this approach we can save the cost of checking the answers manually and reduce the workload of the teachers by automating the manual checking process. The evaluation time is also reduced by using this software.


Author(s):  
Muhammad Yasir Zaheen ◽  
Zia Mohi-u-din ◽  
Ali Akber Siddique ◽  
Muhammad Tahir Qadri

In recent times due to rise in terrorism, people need to live in a safer place where unidentified persons will not be allowed to enter in the premises. Securing of major areas is a vital issue that needs to be addressed for the intelligence and security agencies. At the surrounding of premises, CCTV (CloseCircuit Television) cameras are usually installed to identify the number plate from database by using OCR (Optical Character Recognition) algorithm. This method of security by identifying only vehicle without verifying the person inside it is usually causing serious security issues. Identification of a person is usually done through image processing by using Viola Jones algorithm and acquire the information of the facial components to create a dataset for machine learning. It is imperative to introduce such a system that will be capable to identify the person along with the number plate of vehicle from the stored database. In this research, a comprehensive security system based on face recognition integrated with the vehicle number plate is proposed. The combined information of both dedicated cameras is then transferred to the based station for identification. This system is capable, of securing premises from crime in a more enhanced way.


Sign in / Sign up

Export Citation Format

Share Document