New advances in extracting and learning from protein–protein interactions within unstructured biomedical text data

2019 ◽  
Vol 3 (4) ◽  
pp. 357-369
Author(s):  
J. Harry Caufield ◽  
Peipei Ping

Abstract Protein–protein interactions, or PPIs, constitute a basic unit of our understanding of protein function. Though substantial effort has been made to organize PPI knowledge into structured databases, maintenance of these resources requires careful manual curation. Even then, many PPIs remain uncurated within unstructured text data. Extracting PPIs from experimental research supports assembly of PPI networks and highlights relationships crucial to elucidating protein functions. Isolating specific protein–protein relationships from numerous documents is technically demanding by both manual and automated means. Recent advances in the design of these methods have leveraged emerging computational developments and have demonstrated impressive results on test datasets. In this review, we discuss recent developments in PPI extraction from unstructured biomedical text. We explore the historical context of these developments, recent strategies for integrating and comparing PPI data, and their application to advancing the understanding of protein function. Finally, we describe the challenges facing the application of PPI mining to the text concerning protein families, using the multifunctional 14-3-3 protein family as an example.

2020 ◽  
Author(s):  
A. Khanteymoori ◽  
M. B. Ghajehlo ◽  
S. Behrouzinia ◽  
M. H. Olyaee

AbstractProtein function prediction based on protein-protein interactions (PPI) is one of the most important challenges of the Post-Genomic era. Due to the fact that determining protein function by experimental techniques can be costly, function prediction has become an important challenge for computational biology and bioinformatics. Some researchers utilize graph- (or network-) based methods using PPI networks for un-annotated proteins. The aim of this study is to increase the accuracy of the protein function prediction using two proposed methods.To predict protein functions, we propose a Protein Function Prediction based on Clique Analysis (ProCbA) and Protein Function Prediction on Neighborhood Counting using functional aggregation (ProNC-FA). Both ProCbA and ProNC-FA can predict the functions of unknown proteins. In addition, in ProNC-FA which is not including new algorithm; we try to address the essence of incomplete and noisy data of PPI era in order to achieving a network with complete functional aggregation. The experimental results on MIPS data and the 17 different explained datasets validate the encouraging performance and the strength of both ProCbA and ProNC-FA on function prediction. Experimental result analysis as can be seen in Section IV, the both ProCbA and ProNC-FA are generally able to outperform all the other methods.


2020 ◽  
Author(s):  
Warith Eddine DJEDDI ◽  
Sadok BEN YAHIA ◽  
Engelbert MEPHU NGUIFO

Abstract Background: One of the challenges of the post-genomic era is to provide accurate function annotations for orphan and unannotated protein sequences. With the recent availability of huge protein-protein interactions networks for many model species, the computational methods revealed a great requirement to elucidate protein function based on many strategies. In this respect, most computational approaches integrate diverse kinds of functional interactions to unveil protein functions by transferring annotations across different species by relying on similar sequence, structure 2D/3D, amino acid motifs or phylogenetic profiles. Results: In this work, we introduce a new approach called TANA for inferring protein functions. The main originality of the introduced approach stands on the function prediction for the unannotated protein by transferring annotation via a network alignment as well as from the direct interaction neighborhood within their PPI networks. Doing so, we are able to discover the functions of proteins that could not to be easily described by sequence homology. We assess the performance of our method using the standard metrics established by the CAFA and highlight a sharp significant improvement over other competitive methods, in particular for predicting molecular functions. Conclusions: This research is one of the first attempts that combine sequence and networks-multiple-alignment-based function prediction approaches. We have been able to assess the accuracy of the prediction using pairwise and multiple alignment of the PPI networks for the compared species. Therefore, we recommend using different strategies (i.e pairwise, multiple, with/without neighborhood networks) especially in situations where the functions of the protein are not known in advance.


2019 ◽  
Author(s):  
Hassan Kané ◽  
Mohamed Coulibali ◽  
Ali Abdalla ◽  
Pelkins Ajanoh

ABSTRACTComputational methods that infer the function of proteins are key to understanding life at the molecular level. In recent years, representation learning has emerged as a powerful paradigm to discover new patterns among entities as varied as images, words, speech, molecules. In typical representation learning, there is only one source of data or one level of abstraction at which the learned representation occurs. However, proteins can be described by their primary, secondary, tertiary, and quaternary structure or even as nodes in protein-protein interaction networks. Given that protein function is an emergent property of all these levels of interactions in this work, we learn joint representations from both amino acid sequence and multilayer networks representing tissue-specific protein-protein interactions. Using these hybrid representations, we show that simple machine learning models trained using these hybrid representations outperform existing network-based methods on the task of tissue-specific protein function prediction on 13 out of 13 tissues. Furthermore, these representations outperform existing ones by 14% on average.


2017 ◽  
Author(s):  
Pin-San Xu ◽  
Jun Luo ◽  
Tong-Yi Dou

Most biological processes within a cell are carried out by protein-protein interaction (PPI) networks, or so called interactomics. Therefore, identification of PPIs is crucial to elucidating protein functions and further understanding of various cellular biological processes. Currently, a series of high-throughput experimental technologies for detect PPIs have been presented. However, the time-consuming and labor-driven characteristics of these methods forced people to turn to virtual technology for PPIs prediction. Herein, we developed a new predictor which uses stacking algorithm with information extraction by wavelet transform. When applied on the Saccharomyces cerevisiae PPI dataset, the proposed method got a prediction accuracy of 83.35% with sensitivity of 92.95% at the specificity of 65.41%. An independent data set of 2726 Helicobacter pylori PPIs was also used to evaluate this prediction model, and the prediction accuracy is 80.39%, which is better than that of most existing methods.


2014 ◽  
pp. S155-S164 ◽  
Author(s):  
V. OBSILOVA ◽  
M. KOPECKA ◽  
D. KOSEK ◽  
M. KACIROVA ◽  
S. KYLAROVA ◽  
...  

Many aspects of protein function regulation require specific protein-protein interactions to carry out the exact biochemical and cellular functions. The highly conserved members of the 14-3-3 protein family mediate such interactions and through binding to hundreds of other proteins provide multitude of regulatory functions, thus playing key roles in many cellular processes. The 14-3-3 protein binding can affect the function of the target protein in many ways including the modulation of its enzyme activity, its subcellular localization, its structure and stability, or its molecular interactions. In this minireview, we focus on mechanisms of the 14-3-3 protein-dependent regulation of three important 14-3-3 binding partners: yeast neutral trehalase Nth1, regulator of G-protein signaling 3 (RGS3), and phosducin.


2018 ◽  
Author(s):  
Sun Sook Chung ◽  
Anna Laddach ◽  
N. Shaun B. Thomas ◽  
Franca Fraternali

AbstractRecent advances in biotechnologies for genomics and proteomics have expanded our understanding of biological components which play crucial roles in complex mechanisms related to cancer. However, it is still challenging to extract from the available knowledge reliable targets to use in a translational setting. The reasons for this are manifold, but essentially distilling real biological signal from heterogeneous “big data” collections is the major hurdle. Here, we aim to establish an in-silico pipeline to explore mutations and their effects on protein-protein interactions, with a focus on acute myeloid leukaemia (AML), one of the most common blood cancers with the highest mortality rate. Our method, based on cyclic interactions of a small number of proteins topologically linked in the network (short loop network motifs), highlights specific protein-protein interactions (PPIs) and their functions in AML when compared with other leukaemias. We also developed a new property named ‘short loop commonality’ to measure indirect PPIs occurring via common short loop interactions. This new method detects “modules” of PPI networks (PPINs) enriched with common biological functions which have proteins that contain mutation hotspots. We further perform 3D structural modelling to extract atomistic details, which shows that such hotspots map to PPI interfaces as well as active sites. Thus, our study proposes a framework for the macroscopic and microscopic investigation of PPINs, their relation to cancers, and highlights important functional modules in the network to be exploited in targeted drug screening.


2017 ◽  
Author(s):  
Pin-San Xu ◽  
Jun Luo ◽  
Tong-Yi Dou

Most biological processes within a cell are carried out by protein-protein interaction (PPI) networks, or so called interactomics. Therefore, identification of PPIs is crucial to elucidating protein functions and further understanding of various cellular biological processes. Currently, a series of high-throughput experimental technologies for detect PPIs have been presented. However, the time-consuming and labor-driven characteristics of these methods forced people to turn to virtual technology for PPIs prediction. Herein, we developed a new predictor which uses stacking algorithm with information extraction by wavelet transform. When applied on the Saccharomyces cerevisiae PPI dataset, the proposed method got a prediction accuracy of 83.35% with sensitivity of 92.95% at the specificity of 65.41%. An independent data set of 2726 Helicobacter pylori PPIs was also used to evaluate this prediction model, and the prediction accuracy is 80.39%, which is better than that of most existing methods.


2021 ◽  
Vol 3 (1) ◽  
Author(s):  
Sun Sook Chung ◽  
Joseph C F Ng ◽  
Anna Laddach ◽  
N Shaun B Thomas ◽  
Franca Fraternali

Abstract Direct drug targeting of mutated proteins in cancer is not always possible and efficacy can be nullified by compensating protein–protein interactions (PPIs). Here, we establish an in silico pipeline to identify specific PPI sub-networks containing mutated proteins as potential targets, which we apply to mutation data of four different leukaemias. Our method is based on extracting cyclic interactions of a small number of proteins topologically and functionally linked in the Protein–Protein Interaction Network (PPIN), which we call short loop network motifs (SLM). We uncover a new property of PPINs named ‘short loop commonality’ to measure indirect PPIs occurring via common SLM interactions. This detects ‘modules’ of PPI networks enriched with annotated biological functions of proteins containing mutation hotspots, exemplified by FLT3 and other receptor tyrosine kinase proteins. We further identify functional dependency or mutual exclusivity of short loop commonality pairs in large-scale cellular CRISPR–Cas9 knockout screening data. Our pipeline provides a new strategy for identifying new therapeutic targets for drug discovery.


2018 ◽  
Vol 25 (1) ◽  
pp. 5-21 ◽  
Author(s):  
Ylenia Cau ◽  
Daniela Valensin ◽  
Mattia Mori ◽  
Sara Draghi ◽  
Maurizio Botta

14-3-3 is a class of proteins able to interact with a multitude of targets by establishing protein-protein interactions (PPIs). They are usually found in all eukaryotes with a conserved secondary structure and high sequence homology among species. 14-3-3 proteins are involved in many physiological and pathological cellular processes either by triggering or interfering with the activity of specific protein partners. In the last years, the scientific community has collected many evidences on the role played by seven human 14-3-3 isoforms in cancer or neurodegenerative diseases. Indeed, these proteins regulate the molecular mechanisms associated to these diseases by interacting with (i) oncogenic and (ii) pro-apoptotic proteins and (iii) with proteins involved in Parkinson and Alzheimer diseases. The discovery of small molecule modulators of 14-3-3 PPIs could facilitate complete understanding of the physiological role of these proteins, and might offer valuable therapeutic approaches for these critical pathological states.


Sign in / Sign up

Export Citation Format

Share Document