Analysis of Protein/Protein Interactions Through Biomedical Literature: Text Mining of Abstracts vs. Text Mining of Full Text Articles

Author(s):  
Eric P. G. Martin ◽  
Eric G. Bremer ◽  
Marie-Claude Guerin ◽  
Catherine DeSesa ◽  
Olivier Jouve
Author(s):  
Varsha D Badal ◽  
Petras J Kundrotas ◽  
Ilya A Vakser

Abstract Motivation Procedures for structural modeling of protein-protein complexes (protein docking) produce a number of models which need to be further analyzed and scored. Scoring can be based on independently determined constraints on the structure of the complex, such as knowledge of amino acids essential for the protein interaction. Previously, we showed that text mining of residues in freely available PubMed abstracts of papers on studies of protein-protein interactions may generate such constraints. However, absence of post-processing of the spotted residues reduced usability of the constraints, as a significant number of the residues were not relevant for the binding of the specific proteins. Results We explored filtering of the irrelevant residues by two machine learning approaches, Deep Recursive Neural Network (DRNN) and Support Vector Machine (SVM) models with different training/testing schemes. The results showed that the DRNN model is superior to the SVM model when training is performed on the PMC-OA full-text articles and applied to classification (interface or non-interface) of the residues spotted in the PubMed abstracts. When both training and testing is performed on full-text articles or on abstracts, the performance of these models is similar. Thus, in such cases, there is no need to utilize computationally demanding DRNN approach, which is computationally expensive especially at the training stage. The reason is that SVM success is often determined by the similarity in data/text patterns in the training and the testing sets, whereas the sentence structures in the abstracts are, in general, different from those in the full text articles. Availability The code and the datasets generated in this study are available at https://gitlab.ku.edu/vakser-lab-public/text-mining/-/tree/2020-09-04. Supplementary information Supplementary data are available at Bioinformatics online.


2019 ◽  
pp. 20-48
Author(s):  
Geoffrey E. Hill

To understand the evolutionary consequences of poor coadaptation of mitochondrial and nuclear genes, it is necessary to consider in molecular detail the manifestations of mitochondrial dysfunction. Most considerations of mitochondrial dysfunction resulting from mitonuclear incompatibilities focus on protein–protein interactions in the electron transport system, but the interactions of mitochondrial and nuclear genes in enabling the transcription, translation, and replication of mitochondrial DNA can play an equally important role in mitonuclear coevolution and coadaptation. This chapter reviews the extensive literature on how mitochondrial dysfunction is the cause of many inherited human diseases and explains how this biomedical literature connects to a rapidly growing body of research on the evolution and maintenance of coadaptation of mitochondrial and nuclear genes among non-human eukaryotes. The goal of the chapter is to establish the fundamental importance of coadaptation between co-functioning mitochondrial and nuclear genes.


2020 ◽  
Vol 2 (1) ◽  
Author(s):  
Theodosios Theodosiou ◽  
Nikolaos Papanikolaou ◽  
Maria Savvaki ◽  
Giulia Bonetto ◽  
Stella Maxouri ◽  
...  

Abstract The in-depth study of protein–protein interactions (PPIs) is of key importance for understanding how cells operate. Therefore, in the past few years, many experimental as well as computational approaches have been developed for the identification and discovery of such interactions. Here, we present UniReD, a user-friendly, computational prediction tool which analyses biomedical literature in order to extract known protein associations and suggest undocumented ones. As a proof of concept, we demonstrate its usefulness by experimentally validating six predicted interactions and by benchmarking it against public databases of experimentally validated PPIs succeeding a high coverage. We believe that UniReD can become an important and intuitive resource for experimental biologists in their quest for finding novel associations within a protein network and a useful tool to complement experimental approaches (e.g. mass spectrometry) by producing sorted lists of candidate proteins for further experimental validation. UniReD is available at http://bioinformatics.med.uoc.gr/unired/


2019 ◽  
Vol 13 (S1) ◽  
Author(s):  
Qingqing Li ◽  
Zhihao Yang ◽  
Zhehuan Zhao ◽  
Ling Luo ◽  
Zhiheng Li ◽  
...  

Abstract Background Protein–protein interaction (PPI) information extraction from biomedical literature helps unveil the molecular mechanisms of biological processes. Especially, the PPIs associated with human malignant neoplasms can unveil the biology behind these neoplasms. However, such PPI database is not currently available. Results In this work, a database of protein–protein interactions associated with 171 kinds of human malignant neoplasms named HMNPPID is constructed. In addition, a visualization program, named VisualPPI, is provided to facilitate the analysis of the PPI network for a specific neoplasm. Conclusions HMNPPID can hopefully become an important resource for the research on PPIs of human malignant neoplasms since it provides readily available data for healthcare professionals. Thus, they do not need to dig into a large amount of biomedical literatures any more, which may accelerate the researches on the PPIs of malignant neoplasms.


2020 ◽  
Vol 49 (D1) ◽  
pp. D605-D612 ◽  
Author(s):  
Damian Szklarczyk ◽  
Annika L Gable ◽  
Katerina C Nastou ◽  
David Lyon ◽  
Rebecca Kirsch ◽  
...  

Abstract Cellular life depends on a complex web of functional associations between biomolecules. Among these associations, protein–protein interactions are particularly important due to their versatility, specificity and adaptability. The STRING database aims to integrate all known and predicted associations between proteins, including both physical interactions as well as functional associations. To achieve this, STRING collects and scores evidence from a number of sources: (i) automated text mining of the scientific literature, (ii) databases of interaction experiments and annotated complexes/pathways, (iii) computational interaction predictions from co-expression and from conserved genomic context and (iv) systematic transfers of interaction evidence from one organism to another. STRING aims for wide coverage; the upcoming version 11.5 of the resource will contain more than 14 000 organisms. In this update paper, we describe changes to the text-mining system, a new scoring-mode for physical interactions, as well as extensive user interface features for customizing, extending and sharing protein networks. In addition, we describe how to query STRING with genome-wide, experimental data, including the automated detection of enriched functionalities and potential biases in the user's query data. The STRING resource is available online, at https://string-db.org/.


Sign in / Sign up

Export Citation Format

Share Document