scholarly journals 25 Years of Pathway Figures

Author(s):  
Kristina Hanspers ◽  
Anders Riutta ◽  
Martina Kutmon ◽  
Alexander R Pico

Background: Pathway diagrams are fundamental tools for describing biological processes in all aspects of science, including training, generating hypotheses, describing new knowledge and ultimately as communication tools in published work. Thousands of pathway diagrams are published each year as figures in papers. But as static images the pathway knowledge represented in figures is not accessible to researchers for computational queries and analyses. In this study, we aimed to identify pathway figures published in the past 25 years, to characterize the human gene content in figures by optical character recognition, and to describe their utility as a resource for pathway knowledge. Approach: To identify pathway figures representing 25 years of published research, we trained a machine learning service on manually-classified figures and applied it to 235,081 image query results from PubMed Central. Our previously described pipeline was utilized to extract human genes from the pathway figure images. These figures were characterized in terms of their parent papers, human gene content and enriched disease terms. Diverse use cases were explored for this newly accessible pathway resource. Results: We identified 64,643 pathway figures published between 1995 and 2019, depicting 1,112,551 instances of human genes (13,464 unique NCBI Genes) in various interactions and contexts. This represents more genes than found in the text of the same papers, as well as genes not found in any pathway database. We developed an interactive web tool to explore the results from the 65k set of figures, and used this tool to explore the history of scientific discovery of the Hippo Signaling pathway. We also defined a filtered set of 32k pathway figures useful for enrichment analysis.

2020 ◽  
Vol 21 (1) ◽  
Author(s):  
Kristina Hanspers ◽  
Anders Riutta ◽  
Martina Summer-Kutmon ◽  
Alexander R. Pico

Abstract Thousands of pathway diagrams are published each year as static figures inaccessible to computational queries and analyses. Using a combination of machine learning, optical character recognition, and manual curation, we identified 64,643 pathway figures published between 1995 and 2019 and extracted 1,112,551 instances of human genes, comprising 13,464 unique NCBI genes, participating in a wide variety of biological processes. This collection represents an order of magnitude more genes than found in the text of the same papers, and thousands of genes missing from other pathway databases, thus presenting new opportunities for discovery and research.


1997 ◽  
Vol 9 (1-3) ◽  
pp. 58-77
Author(s):  
Vitaly Kliatskine ◽  
Eugene Shchepin ◽  
Gunnar Thorvaldsen ◽  
Konstantin Zingerman ◽  
Valery Lazarev

In principle, printed source material should be made machine-readable with systems for Optical Character Recognition, rather than being typed once more. Offthe-shelf commercial OCR programs tend, however, to be inadequate for lists with a complex layout. The tax assessment lists that assess most nineteenth century farms in Norway, constitute one example among a series of valuable sources which can only be interpreted successfully with specially designed OCR software. This paper considers the problems involved in the recognition of material with a complex table structure, outlining a new algorithmic model based on ‘linked hierarchies’. Within the scope of this model, a variety of tables and layouts can be described and recognized. The ‘linked hierarchies’ model has been implemented in the ‘CRIPT’ OCR software system, which successfully reads tables with a complex structure from several different historical sources.


2020 ◽  
Vol 2020 (1) ◽  
pp. 78-81
Author(s):  
Simone Zini ◽  
Simone Bianco ◽  
Raimondo Schettini

Rain removal from pictures taken under bad weather conditions is a challenging task that aims to improve the overall quality and visibility of a scene. The enhanced images usually constitute the input for subsequent Computer Vision tasks such as detection and classification. In this paper, we present a Convolutional Neural Network, based on the Pix2Pix model, for rain streaks removal from images, with specific interest in evaluating the results of the processing operation with respect to the Optical Character Recognition (OCR) task. In particular, we present a way to generate a rainy version of the Street View Text Dataset (R-SVTD) for "text detection and recognition" evaluation in bad weather conditions. Experimental results on this dataset show that our model is able to outperform the state of the art in terms of two commonly used image quality metrics, and that it is capable to improve the performances of an OCR model to detect and recognise text in the wild.


2014 ◽  
Vol 6 (1) ◽  
pp. 36-39
Author(s):  
Kevin Purwito

This paper describes about one of the many extension of Optical Character Recognition (OCR), that is Optical Music Recognition (OMR). OMR is used to recognize musical sheets into digital format, such as MIDI or MusicXML. There are many musical symbols that usually used in musical sheets and therefore needs to be recognized by OMR, such as staff; treble, bass, alto and tenor clef; sharp, flat and natural; beams, staccato, staccatissimo, dynamic, tenuto, marcato, stopped note, harmonic and fermata; notes; rests; ties and slurs; and also mordent and turn. OMR usually has four main processes, namely Preprocessing, Music Symbol Recognition, Musical Notation Reconstruction and Final Representation Construction. Each of those four main processes uses different methods and algorithms and each of those processes still needs further development and research. There are already many application that uses OMR to date, but none gives the perfect result. Therefore, besides the development and research for each OMR process, there is also a need to a development and research for combined recognizer, that combines the results from different OMR application to increase the final result’s accuracy. Index Terms—Music, optical character recognition, optical music recognition, musical symbol, image processing, combined recognizer  


Sign in / Sign up

Export Citation Format

Share Document