Identifying novel protein phenotype annotations by hybridizing protein–protein interactions and protein sequence similarities

2016 ◽  
Vol 291 (2) ◽  
pp. 913-934 ◽  
Author(s):  
Lei Chen ◽  
Yu-Hang Zhang ◽  
Tao Huang ◽  
Yu-Dong Cai
2019 ◽  
Vol 20 (S16) ◽  
Author(s):  
Da Zhang ◽  
Mansur Kabuka

Abstract Background Protein-protein interactions(PPIs) engage in dynamic pathological and biological procedures constantly in our life. Thus, it is crucial to comprehend the PPIs thoroughly such that we are able to illuminate the disease occurrence, achieve the optimal drug-target therapeutic effect and describe the protein complex structures. However, compared to the protein sequences obtainable from various species and organisms, the number of revealed protein-protein interactions is relatively limited. To address this dilemma, lots of research endeavor have investigated in it to facilitate the discovery of novel PPIs. Among these methods, PPI prediction techniques that merely rely on protein sequence data are more widespread than other methods which require extensive biological domain knowledge. Results In this paper, we propose a multi-modal deep representation learning structure by incorporating protein physicochemical features with the graph topological features from the PPI networks. Specifically, our method not only bears in mind the protein sequence information but also discerns the topological representations for each protein node in the PPI networks. In our paper, we construct a stacked auto-encoder architecture together with a continuous bag-of-words (CBOW) model based on generated metapaths to study the PPI predictions. Following by that, we utilize the supervised deep neural networks to identify the PPIs and classify the protein families. The PPI prediction accuracy for eight species ranged from 96.76% to 99.77%, which signifies that our multi-modal deep representation learning framework achieves superior performance compared to other computational methods. Conclusion To the best of our knowledge, this is the first multi-modal deep representation learning framework for examining the PPI networks.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Yang Li ◽  
Zheng Wang ◽  
Li-Ping Li ◽  
Zhu-Hong You ◽  
Wen-Zhun Huang ◽  
...  

AbstractVarious biochemical functions of organisms are performed by protein–protein interactions (PPIs). Therefore, recognition of protein–protein interactions is very important for understanding most life activities, such as DNA replication and transcription, protein synthesis and secretion, signal transduction and metabolism. Although high-throughput technology makes it possible to generate large-scale PPIs data, it requires expensive cost of both time and labor, and leave a risk of high false positive rate. In order to formulate a more ingenious solution, biology community is looking for computational methods to quickly and efficiently discover massive protein interaction data. In this paper, we propose a computational method for predicting PPIs based on a fresh idea of combining orthogonal locality preserving projections (OLPP) and rotation forest (RoF) models, using protein sequence information. Specifically, the protein sequence is first converted into position-specific scoring matrices (PSSMs) containing protein evolutionary information by using the Position-Specific Iterated Basic Local Alignment Search Tool (PSI-BLAST). Then we characterize a protein as a fixed length feature vector by applying OLPP to PSSMs. Finally, we train an RoF classifier for the purpose of identifying non-interacting and interacting protein pairs. The proposed method yielded a significantly better results than existing methods, with 90.07% and 96.09% prediction accuracy on Yeast and Human datasets. Our experiment show the proposed method can serve as a useful tool to accelerate the process of solving key problems in proteomics.


2014 ◽  
Vol 12 (06) ◽  
pp. 1442008 ◽  
Author(s):  
Jung-Hsien Chiang ◽  
Jiun-Huang Ju

Protein–protein interactions (PPIs) are involved in the majority of biological processes. Identification of PPIs is therefore one of the key aims of biological research. Although there are many databases of PPIs, many other unidentified PPIs could be buried in the biomedical literature. Therefore, automated identification of PPIs from biomedical literature repositories could be used to discover otherwise hidden interactions. Search engines, such as Google, have been successfully applied to measure the relatedness among words. Inspired by such approaches, we propose a novel method to identify PPIs through semantic similarity measures among protein mentions. We define six semantic similarity measures as features based on the page counts retrieved from the MEDLINE database. A machine learning classifier, Random Forest, is trained using the above features. The proposed approach achieve an averaged micro-F of 71.28% and an averaged macro-F of 64.03% over five PPI corpora, an improvement over the results of using only the conventional co-occurrence feature (averaged micro-F of 68.79% and an averaged macro-F of 60.49%). A relation-word reinforcement further improves the averaged micro-F to 71.3% and averaged macro-F to 65.12%. Comparing the results of the current work with other studies on the AIMed corpus (ranging from 77.58% to 85.1% in micro-F, 62.18% to 76.27% in macro-F), we show that the proposed approach achieves micro-F of 81.88% and macro-F of 64.01% without the use of sophisticated feature extraction. Finally, we manually examine the newly discovered PPI pairs based on a literature review, and the results suggest that our approach could extract novel protein–protein interactions.


Cells ◽  
2019 ◽  
Vol 8 (2) ◽  
pp. 122 ◽  
Author(s):  
Yanbin Wang ◽  
Zhu-Hong You ◽  
Shan Yang ◽  
Xiao Li ◽  
Tong-Hai Jiang ◽  
...  

Many life activities and key functions in organisms are maintained by different types of protein–protein interactions (PPIs). In order to accelerate the discovery of PPIs for different species, many computational methods have been developed. Unfortunately, even though computational methods are constantly evolving, efficient methods for predicting PPIs from protein sequence information have not been found for many years due to limiting factors including both methodology and technology. Inspired by the similarity of biological sequences and languages, developing a biological language processing technology may provide a brand new theoretical perspective and feasible method for the study of biological sequences. In this paper, a pure biological language processing model is proposed for predicting protein–protein interactions only using a protein sequence. The model was constructed based on a feature representation method for biological sequences called bio-to-vector (Bio2Vec) and a convolution neural network (CNN). The Bio2Vec obtains protein sequence features by using a “bio-word” segmentation system and a word representation model used for learning the distributed representation for each “bio-word”. The Bio2Vec supplies a frame that allows researchers to consider the context information and implicit semantic information of a bio sequence. A remarkable improvement in PPIs prediction performance has been observed by using the proposed model compared with state-of-the-art methods. The presentation of this approach marks the start of “bio language processing technology,” which could cause a technological revolution and could be applied to improve the quality of predictions in other problems.


2015 ◽  
Vol 2015 ◽  
pp. 1-9 ◽  
Author(s):  
Zhu-Hong You ◽  
Jianqiang Li ◽  
Xin Gao ◽  
Zhou He ◽  
Lin Zhu ◽  
...  

Proteins and their interactions lie at the heart of most underlying biological processes. Consequently, correct detection of protein-protein interactions (PPIs) is of fundamental importance to understand the molecular mechanisms in biological systems. Although the convenience brought by high-throughput experiment in technological advances makes it possible to detect a large amount of PPIs, the data generated through these methods is unreliable and may not be completely inclusive of all possible PPIs. Targeting at this problem, this study develops a novel computational approach to effectively detect the protein interactions. This approach is proposed based on a novel matrix-based representation of protein sequence combined with the algorithm of support vector machine (SVM), which fully considers the sequence order and dipeptide information of the protein primary sequence. When performed on yeast PPIs datasets, the proposed method can reach 90.06% prediction accuracy with 94.37% specificity at the sensitivity of 85.74%, indicating that this predictor is a useful tool to predict PPIs. Achieved results also demonstrate that our approach can be a helpful supplement for the interactions that have been detected experimentally.


2018 ◽  
Author(s):  
Kalyani B. Karunakaran ◽  
Naveena Yanamala ◽  
Gregory Boyce ◽  
Madhavi K. Ganapathiraju

AbstractMalignant pleural mesothelioma (MPM) is an aggressive cancer of the thorax with a median survival of one year. We constructed an ‘MPM interactome’ with over 300 computationally predicted PPIs and over 1300 known PPIs of 62 literature-curated genes whose activity affects MPM. Known PPIs of the 62 MPM associated genes were derived from BioGRID and HPRD databases. Novel PPIs were predicted by applying the HiPPIP algorithm, which computes features of protein pairs such as cellular localization, molecular function, biological process membership, genomic location of the gene, gene expression in microarray experiments, protein domains and tissue membership, and classifies the pairwise features asinteractingornon-interactingbased on a random forest model. To our satisfaction, the interactome is significantly enriched with genes differentially expressed in MPM tumors compared with normal pleura, and with other thoracic tumors. The interactome is also significantly enriched with genes whose high expression has been correlated with unfavorable prognosis in lung cancer, and with genes differentially expressed on crocidolite exposure. 28 of the interactors of MPM proteins are targets of 147 FDA-approved drugs. By comparing differential expression profiles induced by drug to profiles induced by MPM, potentially repurposable drugs are identified from this drug list. Development of PPIs of disease-specific set of genes is a powerful approach with high translational impact – the interactome is a vehicle to piece together an integrated view on how genes associated with MPM through various high throughput studies are functionally linked, leading to clinically translatable results such as clinical trials with repurposed drugs. The PPIs are made available on a webserver, calledWiki-Pi MPMathttp://severus.dbmi.pitt.edu/wiki-MPMwith advanced search capabilities.One Sentence SummaryMesothelioma Interactome with 367 novel protein-protein interactions may shed light on the mechanisms of cancer genesis and progression


Sign in / Sign up

Export Citation Format

Share Document