A novel method to predict protein-protein interactions based on the information of protein sequence

AbstractObjective: The objective of this work is to search for a novel method to explore the disrupted pathways associated with periodontitis (PD) based on the network level.Methods: Firstly, the differential expression genes (DEGs) between PD patients and cognitively normal subjects were inferred based on LIMMA package. Then, the protein-protein interactions (PPI) in each pathway were explored by Empirical Bayesian (EB) co-expression program. Specifically, we determined the 100th weight value as the threshold value of the disrupted pathways of PPI by constructing the randomly model and confirmed the weight value of each pathway. Meanwhile, we dissected the disrupted pathways under the weight value > the threshold value. Pathways enrichment analyses of DEGs were carried out based on Expression Analysis Systematic Explored (EASE) test. Finally, the better method was selected based on the more rich and significant obtained pathways by comparing the two methods.Results: After the calculation of LIMMA package, we estimated 524 DEGs in all. Then we determined 0.115222 as the threshold value of the disrupted pathways of PPI. When the weight value>0.115222, there were 258 disrupted pathways of PPI enriched in. Additionally, we observed those 524 DEGs that were enriched in 4 pathways under EASE=0.1.Conclusion: We proposed a novel network method inferring the disrupted pathway for PD. The disrupted pathways might be underlying biomarkers for treatment associated with PD.

Download Full-text

Multimodal deep representation learning for protein interaction identification and protein family classification

BMC Bioinformatics ◽

10.1186/s12859-019-3084-y ◽

2019 ◽

Vol 20 (S16) ◽

Cited By ~ 4

Author(s):

Da Zhang ◽

Mansur Kabuka

Keyword(s):

Protein Interactions ◽

Protein Sequence ◽

Representation Learning ◽

Superior Performance ◽

Sequence Information ◽

Protein Protein Interactions ◽

Learning Framework ◽

Topological Features ◽

Ppi Networks ◽

Ppi Prediction

Abstract Background Protein-protein interactions(PPIs) engage in dynamic pathological and biological procedures constantly in our life. Thus, it is crucial to comprehend the PPIs thoroughly such that we are able to illuminate the disease occurrence, achieve the optimal drug-target therapeutic effect and describe the protein complex structures. However, compared to the protein sequences obtainable from various species and organisms, the number of revealed protein-protein interactions is relatively limited. To address this dilemma, lots of research endeavor have investigated in it to facilitate the discovery of novel PPIs. Among these methods, PPI prediction techniques that merely rely on protein sequence data are more widespread than other methods which require extensive biological domain knowledge. Results In this paper, we propose a multi-modal deep representation learning structure by incorporating protein physicochemical features with the graph topological features from the PPI networks. Specifically, our method not only bears in mind the protein sequence information but also discerns the topological representations for each protein node in the PPI networks. In our paper, we construct a stacked auto-encoder architecture together with a continuous bag-of-words (CBOW) model based on generated metapaths to study the PPI predictions. Following by that, we utilize the supervised deep neural networks to identify the PPIs and classify the protein families. The PPI prediction accuracy for eight species ranged from 96.76% to 99.77%, which signifies that our multi-modal deep representation learning framework achieves superior performance compared to other computational methods. Conclusion To the best of our knowledge, this is the first multi-modal deep representation learning framework for examining the PPI networks.

Download Full-text

Robust and accurate prediction of protein–protein interactions by exploiting evolutionary information

Scientific Reports ◽

10.1038/s41598-021-96265-z ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Yang Li ◽

Zheng Wang ◽

Li-Ping Li ◽

Zhu-Hong You ◽

Wen-Zhun Huang ◽

...

Keyword(s):

Protein Interactions ◽

Protein Sequence ◽

Large Scale ◽

False Positive Rate ◽

Computational Method ◽

Evolutionary Information ◽

Local Alignment ◽

Protein Interaction Data ◽

Sequence Information ◽

Protein Protein Interactions

AbstractVarious biochemical functions of organisms are performed by protein–protein interactions (PPIs). Therefore, recognition of protein–protein interactions is very important for understanding most life activities, such as DNA replication and transcription, protein synthesis and secretion, signal transduction and metabolism. Although high-throughput technology makes it possible to generate large-scale PPIs data, it requires expensive cost of both time and labor, and leave a risk of high false positive rate. In order to formulate a more ingenious solution, biology community is looking for computational methods to quickly and efficiently discover massive protein interaction data. In this paper, we propose a computational method for predicting PPIs based on a fresh idea of combining orthogonal locality preserving projections (OLPP) and rotation forest (RoF) models, using protein sequence information. Specifically, the protein sequence is first converted into position-specific scoring matrices (PSSMs) containing protein evolutionary information by using the Position-Specific Iterated Basic Local Alignment Search Tool (PSI-BLAST). Then we characterize a protein as a fixed length feature vector by applying OLPP to PSSMs. Finally, we train an RoF classifier for the purpose of identifying non-interacting and interacting protein pairs. The proposed method yielded a significantly better results than existing methods, with 90.07% and 96.09% prediction accuracy on Yeast and Human datasets. Our experiment show the proposed method can serve as a useful tool to accelerate the process of solving key problems in proteomics.

Download Full-text

A novel method to map and compare protein-protein interactions in spherical viral capsids

Proteins Structure Function and Bioinformatics ◽

10.1002/prot.22088 ◽

2008 ◽

Vol 73 (3) ◽

pp. 644-655 ◽

Cited By ~ 13

Author(s):

Mauricio Carrillo-Tripp ◽

Charles L. Brooks ◽

Vijay S. Reddy

Keyword(s):

Protein Interactions ◽

Protein Protein Interactions ◽

Viral Capsids ◽

Novel Method

Download Full-text

ChiPPI: a novel method for mapping chimeric protein–protein interactions uncovers selection principles of protein fusion events in cancer

Nucleic Acids Research ◽

10.1093/nar/gkx423 ◽

2017 ◽

Vol 45 (12) ◽

pp. 7094-7105 ◽

Cited By ~ 20

Author(s):

Milana Frenkel-Morgenstern ◽

Alessandro Gorohovski ◽

Somnath Tagore ◽

Vaishnovi Sekar ◽

Miguel Vazquez ◽

...

Keyword(s):

Protein Interactions ◽

Chimeric Protein ◽

Protein Protein Interactions ◽

Protein Fusion ◽

Selection Principles ◽

Novel Method

Download Full-text

Predicting Protein-Protein Interactions from Protein Sequence Information Using Dual-Tree Complex Wavelet Transform

Intelligent Computing Theories and Application - Lecture Notes in Computer Science ◽

10.1007/978-3-030-60802-6_13 ◽

2020 ◽

pp. 132-142

Author(s):

Jie Pan ◽

Zhu-Hong You ◽

Chang-Qing Yu ◽

Li-Ping Li ◽

Xin-ke Zhan

Keyword(s):

Wavelet Transform ◽

Protein Interactions ◽

Protein Sequence ◽

Sequence Information ◽

Protein Protein Interactions ◽

Complex Wavelet Transform ◽

Complex Wavelet

Download Full-text

Predicting Protein-Protein Interactions from Protein Sequence Using Locality Preserving Projections and Rotation Forest

Intelligent Computing Theories and Application - Lecture Notes in Computer Science ◽

10.1007/978-3-030-60802-6_12 ◽

2020 ◽

pp. 121-131

Author(s):

Xinke Zhan ◽

Zhuhong You ◽

Changqing Yu ◽

Jie Pan ◽

Ruiyang Li

Keyword(s):

Protein Interactions ◽

Protein Sequence ◽

Protein Protein Interactions ◽

Rotation Forest ◽

Locality Preserving Projections ◽

Locality Preserving

Download Full-text

Discovering novel protein–protein interactions by measuring the protein semantic similarity from the biomedical literature

Journal of Bioinformatics and Computational Biology ◽

10.1142/s0219720014420086 ◽

2014 ◽

Vol 12 (06) ◽

pp. 1442008 ◽

Cited By ~ 4

Author(s):

Jung-Hsien Chiang ◽

Jiun-Huang Ju

Keyword(s):

Semantic Similarity ◽

Protein Interactions ◽

Similarity Measures ◽

Biomedical Literature ◽

Biological Research ◽

Protein Protein Interactions ◽

Automated Identification ◽

Learning Classifier ◽

Novel Method ◽

Novel Protein

Protein–protein interactions (PPIs) are involved in the majority of biological processes. Identification of PPIs is therefore one of the key aims of biological research. Although there are many databases of PPIs, many other unidentified PPIs could be buried in the biomedical literature. Therefore, automated identification of PPIs from biomedical literature repositories could be used to discover otherwise hidden interactions. Search engines, such as Google, have been successfully applied to measure the relatedness among words. Inspired by such approaches, we propose a novel method to identify PPIs through semantic similarity measures among protein mentions. We define six semantic similarity measures as features based on the page counts retrieved from the MEDLINE database. A machine learning classifier, Random Forest, is trained using the above features. The proposed approach achieve an averaged micro-F of 71.28% and an averaged macro-F of 64.03% over five PPI corpora, an improvement over the results of using only the conventional co-occurrence feature (averaged micro-F of 68.79% and an averaged macro-F of 60.49%). A relation-word reinforcement further improves the averaged micro-F to 71.3% and averaged macro-F to 65.12%. Comparing the results of the current work with other studies on the AIMed corpus (ranging from 77.58% to 85.1% in micro-F, 62.18% to 76.27% in macro-F), we show that the proposed approach achieves micro-F of 81.88% and macro-F of 64.01% without the use of sophisticated feature extraction. Finally, we manually examine the newly discovered PPI pairs based on a literature review, and the results suggest that our approach could extract novel protein–protein interactions.

Download Full-text

A High Efficient Biological Language Model for Predicting Protein–Protein Interactions

Cells ◽

10.3390/cells8020122 ◽

2019 ◽

Vol 8 (2) ◽

pp. 122 ◽

Cited By ~ 26

Author(s):

Yanbin Wang ◽

Zhu-Hong You ◽

Shan Yang ◽

Xiao Li ◽

Tong-Hai Jiang ◽

...

Keyword(s):

Computational Methods ◽

Language Processing ◽

Protein Interactions ◽

Protein Sequence ◽

Processing Technology ◽

Limiting Factors ◽

Sequence Information ◽

Biological Sequences ◽

Protein Protein Interactions ◽

High Efficient

Many life activities and key functions in organisms are maintained by different types of protein–protein interactions (PPIs). In order to accelerate the discovery of PPIs for different species, many computational methods have been developed. Unfortunately, even though computational methods are constantly evolving, efficient methods for predicting PPIs from protein sequence information have not been found for many years due to limiting factors including both methodology and technology. Inspired by the similarity of biological sequences and languages, developing a biological language processing technology may provide a brand new theoretical perspective and feasible method for the study of biological sequences. In this paper, a pure biological language processing model is proposed for predicting protein–protein interactions only using a protein sequence. The model was constructed based on a feature representation method for biological sequences called bio-to-vector (Bio2Vec) and a convolution neural network (CNN). The Bio2Vec obtains protein sequence features by using a “bio-word” segmentation system and a word representation model used for learning the distributed representation for each “bio-word”. The Bio2Vec supplies a frame that allows researchers to consider the context information and implicit semantic information of a bio sequence. A remarkable improvement in PPIs prediction performance has been observed by using the proposed model compared with state-of-the-art methods. The presentation of this approach marks the start of “bio language processing technology,” which could cause a technological revolution and could be applied to improve the quality of predictions in other problems.

Download Full-text