Analysis of Protein/Protein Interactions Through Biomedical Literature: Text Mining of Abstracts vs. Text Mining of Full Text Articles

Abstract Motivation Procedures for structural modeling of protein-protein complexes (protein docking) produce a number of models which need to be further analyzed and scored. Scoring can be based on independently determined constraints on the structure of the complex, such as knowledge of amino acids essential for the protein interaction. Previously, we showed that text mining of residues in freely available PubMed abstracts of papers on studies of protein-protein interactions may generate such constraints. However, absence of post-processing of the spotted residues reduced usability of the constraints, as a significant number of the residues were not relevant for the binding of the specific proteins. Results We explored filtering of the irrelevant residues by two machine learning approaches, Deep Recursive Neural Network (DRNN) and Support Vector Machine (SVM) models with different training/testing schemes. The results showed that the DRNN model is superior to the SVM model when training is performed on the PMC-OA full-text articles and applied to classification (interface or non-interface) of the residues spotted in the PubMed abstracts. When both training and testing is performed on full-text articles or on abstracts, the performance of these models is similar. Thus, in such cases, there is no need to utilize computationally demanding DRNN approach, which is computationally expensive especially at the training stage. The reason is that SVM success is often determined by the similarity in data/text patterns in the training and the testing sets, whereas the sentence structures in the abstracts are, in general, different from those in the full text articles. Availability The code and the datasets generated in this study are available at https://gitlab.ku.edu/vakser-lab-public/text-mining/-/tree/2020-09-04. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

From Biomedical Literature to Knowledge: Mining Protein-Protein Interactions

Studies in Computational Intelligence - Computational Intelligence in Biomedicine and Bioinformatics ◽

10.1007/978-3-540-70778-3_17 ◽

2009 ◽

pp. 397-421 ◽

Cited By ~ 2

Author(s):

Deyu Zhou ◽

Yulan He ◽

Chee Keong Kwoh

Keyword(s):

Protein Interactions ◽

Biomedical Literature ◽

Protein Protein Interactions ◽

Knowledge Mining

Download Full-text

Forms and consequences of incompatibility

Mitonuclear Ecology ◽

10.1093/oso/9780198818250.003.0002 ◽

2019 ◽

pp. 20-48

Author(s):

Geoffrey E. Hill

Keyword(s):

Mitochondrial Dna ◽

Electron Transport ◽

Mitochondrial Dysfunction ◽

Protein Interactions ◽

Biomedical Literature ◽

Nuclear Genes ◽

Extensive Literature ◽

Protein Protein Interactions ◽

Growing Body ◽

Evolutionary Consequences

To understand the evolutionary consequences of poor coadaptation of mitochondrial and nuclear genes, it is necessary to consider in molecular detail the manifestations of mitochondrial dysfunction. Most considerations of mitochondrial dysfunction resulting from mitonuclear incompatibilities focus on protein–protein interactions in the electron transport system, but the interactions of mitochondrial and nuclear genes in enabling the transcription, translation, and replication of mitochondrial DNA can play an equally important role in mitonuclear coevolution and coadaptation. This chapter reviews the extensive literature on how mitochondrial dysfunction is the cause of many inherited human diseases and explains how this biomedical literature connects to a rapidly growing body of research on the evolution and maintenance of coadaptation of mitochondrial and nuclear genes among non-human eukaryotes. The goal of the chapter is to establish the fundamental importance of coadaptation between co-functioning mitochondrial and nuclear genes.

Download Full-text

UniProt-Related Documents (UniReD): assisting wet lab biologists in their quest on finding novel counterparts in a protein network

NAR Genomics and Bioinformatics ◽

10.1093/nargab/lqaa005 ◽

2020 ◽

Vol 2 (1) ◽

Cited By ~ 1

Author(s):

Theodosios Theodosiou ◽

Nikolaos Papanikolaou ◽

Maria Savvaki ◽

Giulia Bonetto ◽

Stella Maxouri ◽

...

Keyword(s):

Protein Interactions ◽

Computational Prediction ◽

Biomedical Literature ◽

Protein Network ◽

Protein Protein Interactions ◽

High Coverage ◽

Depth Study ◽

Experimental Approaches ◽

Wet Lab ◽

User Friendly

Abstract The in-depth study of protein–protein interactions (PPIs) is of key importance for understanding how cells operate. Therefore, in the past few years, many experimental as well as computational approaches have been developed for the identification and discovery of such interactions. Here, we present UniReD, a user-friendly, computational prediction tool which analyses biomedical literature in order to extract known protein associations and suggest undocumented ones. As a proof of concept, we demonstrate its usefulness by experimentally validating six predicted interactions and by benchmarking it against public databases of experimentally validated PPIs succeeding a high coverage. We believe that UniReD can become an important and intuitive resource for experimental biologists in their quest for finding novel associations within a protein network and a useful tool to complement experimental approaches (e.g. mass spectrometry) by producing sorted lists of candidate proteins for further experimental validation. UniReD is available at http://bioinformatics.med.uoc.gr/unired/

Download Full-text

Analysis of Protein Phosphorylation and Its Functional Impact on Protein–Protein Interactions via Text Mining of the Scientific Literature

Protein Bioinformatics - Methods in Molecular Biology ◽

10.1007/978-1-4939-6783-4_10 ◽

2017 ◽

pp. 213-232 ◽

Cited By ~ 3

Author(s):

Qinghua Wang ◽

Karen E. Ross ◽

Hongzhan Huang ◽

Jia Ren ◽

Gang Li ◽

...

Keyword(s):

Text Mining ◽

Protein Phosphorylation ◽

Protein Interactions ◽

Scientific Literature ◽

Protein Protein Interactions ◽

Functional Impact

Download Full-text

HMNPPID—human malignant neoplasm protein–protein interaction database

Human Genomics ◽

10.1186/s40246-019-0223-5 ◽

2019 ◽

Vol 13 (S1) ◽

Author(s):

Qingqing Li ◽

Zhihao Yang ◽

Zhehuan Zhao ◽

Ling Luo ◽

Zhiheng Li ◽

...

Keyword(s):

Protein Interaction ◽

Protein Interactions ◽

Molecular Mechanisms ◽

Malignant Neoplasm ◽

Biomedical Literature ◽

Malignant Neoplasms ◽

Protein Protein Interactions ◽

Protein Protein Interaction ◽

Interaction Database ◽

Protein Interaction Database

Abstract Background Protein–protein interaction (PPI) information extraction from biomedical literature helps unveil the molecular mechanisms of biological processes. Especially, the PPIs associated with human malignant neoplasms can unveil the biology behind these neoplasms. However, such PPI database is not currently available. Results In this work, a database of protein–protein interactions associated with 171 kinds of human malignant neoplasms named HMNPPID is constructed. In addition, a visualization program, named VisualPPI, is provided to facilitate the analysis of the PPI network for a specific neoplasm. Conclusions HMNPPID can hopefully become an important resource for the research on PPIs of human malignant neoplasms since it provides readily available data for healthcare professionals. Thus, they do not need to dig into a large amount of biomedical literatures any more, which may accelerate the researches on the PPIs of malignant neoplasms.

Download Full-text

Building and analysis of protein-protein interactions related to diabetes mellitus using support vector machine, biomedical text mining and network analysis

Computational Biology and Chemistry ◽

10.1016/j.compbiolchem.2016.09.011 ◽

2016 ◽

Vol 65 ◽

pp. 37-44 ◽

Cited By ~ 11

Author(s):

Renu Vyas ◽

Sanket Bapat ◽

Esha Jain ◽

Muthukumarasamy Karthikeyan ◽

Sanjeev Tambe ◽

...

Keyword(s):

Diabetes Mellitus ◽

Support Vector Machine ◽

Network Analysis ◽

Text Mining ◽

Protein Interactions ◽

Support Vector ◽

Biomedical Text ◽

Biomedical Text Mining ◽

Protein Protein Interactions

Download Full-text

Using biomedical literature mining to consolidate the set of known human protein-protein interactions

10.3115/1641484.1641491 ◽

2005 ◽

Cited By ~ 3

Author(s):

Arun Ramani ◽

Edward Marcotte ◽

Razvan Bunescu ◽

Raymond Mooney

Keyword(s):

Protein Interactions ◽

Biomedical Literature ◽

Human Protein ◽

Literature Mining ◽

Protein Protein Interactions ◽

Biomedical Literature Mining

Download Full-text

The STRING database in 2021: customizable protein–protein networks, and functional characterization of user-uploaded gene/measurement sets

Nucleic Acids Research ◽

10.1093/nar/gkaa1074 ◽

2020 ◽

Vol 49 (D1) ◽

pp. D605-D612 ◽

Cited By ~ 2

Author(s):

Damian Szklarczyk ◽

Annika L Gable ◽

Katerina C Nastou ◽

David Lyon ◽

Rebecca Kirsch ◽

...

Keyword(s):

Text Mining ◽

Protein Interactions ◽

Functional Characterization ◽

Protein Networks ◽

Protein Protein Interactions ◽

Genomic Context ◽

Mining System ◽

String Database ◽

System A ◽

Physical Interactions

Abstract Cellular life depends on a complex web of functional associations between biomolecules. Among these associations, protein–protein interactions are particularly important due to their versatility, specificity and adaptability. The STRING database aims to integrate all known and predicted associations between proteins, including both physical interactions as well as functional associations. To achieve this, STRING collects and scores evidence from a number of sources: (i) automated text mining of the scientific literature, (ii) databases of interaction experiments and annotated complexes/pathways, (iii) computational interaction predictions from co-expression and from conserved genomic context and (iv) systematic transfers of interaction evidence from one organism to another. STRING aims for wide coverage; the upcoming version 11.5 of the resource will contain more than 14 000 organisms. In this update paper, we describe changes to the text-mining system, a new scoring-mode for physical interactions, as well as extensive user interface features for customizing, extending and sharing protein networks. In addition, we describe how to query STRING with genome-wide, experimental data, including the automated detection of enriched functionalities and potential biases in the user's query data. The STRING resource is available online, at https://string-db.org/.

Download Full-text