Information Extraction of Protein Phosphorylation from Biomedical Literature

Author(s):  
M. Narayanaswamy ◽  
K. E. Ravikumar ◽  
Z. Z. Hu ◽  
K. Vijay-Shanker ◽  
C. H. Wu

Protein posttranslational modification (PTM) is a fundamental biological process, and currently few text mining systems focus on PTM information extraction. A rule-based text mining system, RLIMS-P (Rule-based LIterature Mining System for Protein Phosphorylation), was recently developed by our group to extract protein substrate, kinase and phosphorylated residue/sites from MEDLINE abstracts. This chapter covers the evaluation and benchmarking of RLIMS-P and highlights some novel and unique features of the system. The extraction patterns of RLIMS-P capture a range of lexical, syntactic and semantic constraints found in sentences expressing phosphorylation information. RLIMS-P also has a second phase that puts together information extracted from different sentences. This is an important feature since it is not common to find the kinase, substrate and site of phosphorylation to be mentioned in the same sentence. Small modifications to the rules for extraction of phosphorylation information have also allowed us to develop systems for extraction of two other PTMs, acetylation and methylation. A thorough evaluation of these two systems needs to be completed. Finally, an online version of RLIMSP with enhanced functionalities, namely, phosphorylation annotation ranking, evidence tagging, and protein entity mapping, has been developed and is publicly accessible.

2017 ◽  
Vol 15 (05) ◽  
pp. 1740005 ◽  
Author(s):  
Dongdong Sun ◽  
Minghui Wang ◽  
Ao Li

Due to the importance of post-translational modifications (PTMs) in human health and diseases, PTMs are regularly reported in the biomedical literature. However, the continuing and rapid pace of expansion of this literature brings a huge challenge for researchers and database curators. Therefore, there is a pressing need to aid them in identifying relevant PTM information more efficiently by using a text mining system. So far, only a few web servers are available for mining information of a very limited number of PTMs, which are based on simple pattern matching or pre-defined rules. In our work, in order to help researchers and database curators easily find and retrieve PTM information from available text, we have developed a text mining tool called MPTM, which extracts and organizes valuable knowledge about 11 common PTMs from abstracts in PubMed by using relations extracted from dependency parse trees and a heuristic algorithm. It is the first web server that provides literature mining service for hydroxylation, myristoylation and GPI-anchor. The tool is also used to find new publications on PTMs from PubMed and uncovers potential PTM information by large-scale text analysis. MPTM analyzes text sentences to identify protein names including substrates and protein-interacting enzymes, and automatically associates them with the UniProtKB protein entry. To facilitate further investigation, it also retrieves PTM-related information, such as human diseases, Gene Ontology terms and organisms from the input text and related databases. In addition, an online database (MPTMDB) with extracted PTM information and a local MPTM Lite package are provided on the MPTM website. MPTM is freely available online at http://bioinformatics.ustc.edu.cn/mptm/ and the source codes are hosted on GitHub: https://github.com/USTC-HILAB/MPTM .


2015 ◽  
Vol 6 (4) ◽  
pp. 35-49 ◽  
Author(s):  
Laurent Issertial ◽  
Hiroshi Tsuji

This paper proposes a system called CFP Manager specialized on IT field and designed to ease the process of searching conference suitable to one's need. At present, the handling of CFP faces two problems: for emails, the huge quantity of CFP received can be easily skimmed through. For websites, the reviewing of some of the main CFP aggregators available online points out the lack of usable criteria. This system proposes to answer to these problems via its architecture consisting of three components: firstly an Information Extraction module extracting relevant information (as date, location, etc...) from CFP using rule based text mining algorithm. The second component enriches the now extracted data with external one from ontology models. Finally the last one displays the said data and allows the end user to perform complex queries on the CFP dataset and thus allow him to only access to CFP suitable for him. In order to validate the authors' proposal, they eventually process the well-known precision / recall metric on our information extraction component with an average of 0.95 for precision and 0.91 for recall on three different 100 CFP dataset. This paper finally discusses the validity of our approach by confronting our system for different queries with two systems already available online (WikiCFP and IEEE Conference Search) and basic text searching approach standing for searching in an email box. On a 100 CFP dataset with the wide variety of usable data and the possibility to perform complex queries we surpass basic text searching method and WikiCFP by not returning the false positive usually returned by them and find a result close to the IEEE system.


2020 ◽  
Author(s):  
Bhrugesh Joshi ◽  
Vishvajit Bakarola ◽  
Parth Shah ◽  
Ramar Krishnamurthy

AbstractThe recent pandemic created due to Novel Coronavirus (nCOV-2019) from Wuhan, China demanding a large scale of a general health emergency. This demands novel research on the vaccine to fight against this pandemic situation, re-purposing of the existing drugs, phylogenetic analysis to identify the origin and determine the similarity with other known viruses, etc. The very preliminary task from the research community is to analyze the wide verities of existing related research articles, which is very much time-consuming in such situations where each minute counts for saving hundreds of human lives. The entire manual processing is even lower down the efficiency in mining the information. We have developed a complete automatic literature mining system that delivers efficient and fast mining from existing biomedical literature databases. With the help of modern-day deep learning algorithms, our system also delivers a summarization of important research articles that provides ease and fast comprehension of critical research articles. The system is currently scanning nearly 1,46,115,136 English words from 29,315 research articles in not greater than 1.5 seconds with multiple search keywords. Our research article presents the criticality of literature mining, especially in pandemic situations with the implementation and online deployment of the system.


2018 ◽  
pp. 129-154
Author(s):  
Boya Xie ◽  
Qin Ding ◽  
Di Wu

Driven by the rapidly advancing techniques and increasing interests in biology and medicine, about 2,000 to 4,000 references are added daily to MEDLINE, the US national biomedical bibliographic database. Even for a specific research topic, extracting useful and comprehensive information out of the huge literature data pool is challenging. Text mining techniques become extremely useful when dealing with the abundant biomedical information and they have been applied to various areas in the realm of biomedical research. Instead of providing a brief overview of all text mining techniques and every major biomedical text mining application, this chapter explores in-depth the microRNA profiling area and related text mining tools. As an illustrative example, one rule-based text mining system developed by the authors is discussed in detail. This chapter also includes the discussion of the challenges and potential research areas in biomedical text mining.


2020 ◽  
pp. 394-409
Author(s):  
Laurent Issertial ◽  
Hiroshi Tsuji

This paper proposes a system called CFP Manager specialized on IT field and designed to ease the process of searching conference suitable to one's need. At present, the handling of CFP faces two problems: for emails, the huge quantity of CFP received can be easily skimmed through. For websites, the reviewing of some of the main CFP aggregators available online points out the lack of usable criteria. This system proposes to answer to these problems via its architecture consisting of three components: firstly an Information Extraction module extracting relevant information (as date, location, etc...) from CFP using rule based text mining algorithm. The second component enriches the now extracted data with external one from ontology models. Finally the last one displays the said data and allows the end user to perform complex queries on the CFP dataset and thus allow him to only access to CFP suitable for him. In order to validate the authors' proposal, they eventually process the well-known precision / recall metric on our information extraction component with an average of 0.95 for precision and 0.91 for recall on three different 100 CFP dataset. This paper finally discusses the validity of our approach by confronting our system for different queries with two systems already available online (WikiCFP and IEEE Conference Search) and basic text searching approach standing for searching in an email box. On a 100 CFP dataset with the wide variety of usable data and the possibility to perform complex queries we surpass basic text searching method and WikiCFP by not returning the false positive usually returned by them and find a result close to the IEEE system.


Sign in / Sign up

Export Citation Format

Share Document