scholarly journals A Deep Learning and XGBoost-Based Method for Predicting Protein-Protein Interaction Sites

2021 ◽  
Vol 12 ◽  
Author(s):  
Pan Wang ◽  
Guiyang Zhang ◽  
Zu-Guo Yu ◽  
Guohua Huang

Knowledge about protein-protein interactions is beneficial in understanding cellular mechanisms. Protein-protein interactions are usually determined according to their protein-protein interaction sites. Due to the limitations of current techniques, it is still a challenging task to detect protein-protein interaction sites. In this article, we presented a method based on deep learning and XGBoost (called DeepPPISP-XGB) for predicting protein-protein interaction sites. The deep learning model served as a feature extractor to remove redundant information from protein sequences. The Extreme Gradient Boosting algorithm was used to construct a classifier for predicting protein-protein interaction sites. The DeepPPISP-XGB achieved the following results: area under the receiver operating characteristic curve of 0.681, a recall of 0.624, and area under the precision-recall curve of 0.339, being competitive with the state-of-the-art methods. We also validated the positive role of global features in predicting protein-protein interaction sites.

Author(s):  
Min Zeng ◽  
Fuhao Zhang ◽  
Fang-Xiang Wu ◽  
Yaohang Li ◽  
Jianxin Wang ◽  
...  

Abstract Motivation Protein–protein interactions (PPIs) play important roles in many biological processes. Conventional biological experiments for identifying PPI sites are costly and time-consuming. Thus, many computational approaches have been proposed to predict PPI sites. Existing computational methods usually use local contextual features to predict PPI sites. Actually, global features of protein sequences are critical for PPI site prediction. Results A new end-to-end deep learning framework, named DeepPPISP, through combining local contextual and global sequence features, is proposed for PPI site prediction. For local contextual features, we use a sliding window to capture features of neighbors of a target amino acid as in previous studies. For global sequence features, a text convolutional neural network is applied to extract features from the whole protein sequence. Then the local contextual and global sequence features are combined to predict PPI sites. By integrating local contextual and global sequence features, DeepPPISP achieves the state-of-the-art performance, which is better than the other competing methods. In order to investigate if global sequence features are helpful in our deep learning model, we remove or change some components in DeepPPISP. Detailed analyses show that global sequence features play important roles in DeepPPISP. Availability and implementation The DeepPPISP web server is available at http://bioinformatics.csu.edu.cn/PPISP/. The source code can be obtained from https://github.com/CSUBioGroup/DeepPPISP. Supplementary information Supplementary data are available at Bioinformatics online.


2020 ◽  
Vol 21 (7) ◽  
pp. 2274 ◽  
Author(s):  
Aijun Deng ◽  
Huan Zhang ◽  
Wenyan Wang ◽  
Jun Zhang ◽  
Dingdong Fan ◽  
...  

The study of protein-protein interaction is of great biological significance, and the prediction of protein-protein interaction sites can promote the understanding of cell biological activity and will be helpful for drug development. However, uneven distribution between interaction and non-interaction sites is common because only a small number of protein interactions have been confirmed by experimental techniques, which greatly affects the predictive capability of computational methods. In this work, two imbalanced data processing strategies based on XGBoost algorithm were proposed to re-balance the original dataset from inherent relationship between positive and negative samples for the prediction of protein-protein interaction sites. Herein, a feature extraction method was applied to represent the protein interaction sites based on evolutionary conservatism of proteins, and the influence of overlapping regions of positive and negative samples was considered in prediction performance. Our method showed good prediction performance, such as prediction accuracy of 0.807 and MCC of 0.614, on an original dataset with 10,455 surface residues but only 2297 interface residues. Experimental results demonstrated the effectiveness of our XGBoost-based method.


Author(s):  
Yiwei Li ◽  
Lucian Ilie

AbstractMotivationProteins usually perform their functions by interacting with other proteins, which is why accurately predicting protein-protein interaction (PPI) binding sites is a fundamental problem. Experimental methods are slow and expensive. Therefore, great efforts are being made towards increasing the performance of computational methods.ResultsWe propose DELPHI (DEep Learning Prediction of Highly probable protein Interaction sites), a new sequence-based deep learning suite for PPI binding sites prediction. DELPHI has an ensemble structure with data augmentation and it employs novel features in addition to existing ones. We comprehensively compare DELPHI to nine state-of-the-art programs on five datasets and show that it is more accurate.AvailabilityThe trained model, source code for training, predicting, and data processing are freely available at https://github.com/lucian-ilie/DELPHI. All datasets used in this study can be downloaded at http://www.csd.uwo.ca/~ilie/DELPHI/[email protected]


2021 ◽  
Vol 12 ◽  
Author(s):  
Minli Tang ◽  
Longxin Wu ◽  
Xinyu Yu ◽  
Zhaoqi Chu ◽  
Shuting Jin ◽  
...  

Proteins are the basic substances that undertake human life activities, and they often perform their biological functions through interactions with other biological macromolecules, such as cell transmission and signal transduction. Predicting the interaction sites between proteins can deepen the understanding of the principle of protein interactions, but traditional experimental methods are time-consuming and labor-intensive. In this study, a new hierarchical attention network structure, named HANPPIS, by adding six effective features of protein sequence, position-specific scoring matrix (PSSM), secondary structure, pre-training vector, hydrophilic, and amino acid position, is proposed to predict protein–protein interaction (PPI) sites. The experiment proved that our model has obtained very effective results, which was better than the existing advanced calculation methods. More importantly, we used the double-layer attention mechanism to improve the interpretability of the model and to a certain extent solved the problem of the “black box” of deep neural networks, which can be used as a reference for location positioning on the biological level.


Author(s):  
Yu-Miao Zhang ◽  
Jun Wang ◽  
Tao Wu

In this study, the Agrobacterium infection medium, infection duration, detergent, and cell density were optimized. The sorghum-based infection medium (SbIM), 10-20 min infection time, addition of 0.01% Silwet L-77, and Agrobacterium optical density at 600 nm (OD600), improved the competence of onion epidermal cells to support Agrobacterium infection at >90% efficiency. Cyclin-dependent kinase D-2 (CDKD-2) and cytochrome c-type biogenesis protein (CYCH), protein-protein interactions were localized. The optimized procedure is a quick and efficient system for examining protein subcellular localization and protein-protein interaction.


2020 ◽  
Vol 20 (10) ◽  
pp. 855-882
Author(s):  
Olivia Slater ◽  
Bethany Miller ◽  
Maria Kontoyianni

Drug discovery has focused on the paradigm “one drug, one target” for a long time. However, small molecules can act at multiple macromolecular targets, which serves as the basis for drug repurposing. In an effort to expand the target space, and given advances in X-ray crystallography, protein-protein interactions have become an emerging focus area of drug discovery enterprises. Proteins interact with other biomolecules and it is this intricate network of interactions that determines the behavior of the system and its biological processes. In this review, we briefly discuss networks in disease, followed by computational methods for protein-protein complex prediction. Computational methodologies and techniques employed towards objectives such as protein-protein docking, protein-protein interactions, and interface predictions are described extensively. Docking aims at producing a complex between proteins, while interface predictions identify a subset of residues on one protein that could interact with a partner, and protein-protein interaction sites address whether two proteins interact. In addition, approaches to predict hot spots and binding sites are presented along with a representative example of our internal project on the chemokine CXC receptor 3 B-isoform and predictive modeling with IP10 and PF4.


Author(s):  
Mohammad Hamim Zajuli Al Faroby ◽  
Mohammad Isa Irawan ◽  
Ni Nyoman Tri Puspaningsih

Protein Interaction Analysis (PPI) can be used to identify proteins that have a supporting function on the main protein, especially in the synthesis process. Insulin is synthesized by proteins that have the same molecular function covering different but mutually supportive roles. To identify this function, the translation of Gene Ontology (GO) gives certain characteristics to each protein. This study purpose to predict proteins that interact with insulin using the centrality method as a feature extractor and extreme gradient boosting as a classification algorithm. Characteristics using the centralized method produces  features as a central function of protein. Classification results are measured using measurements, precision, recall and ROC scores. Optimizing the model by finding the right parameters produces an accuracy of  and a ROC score of . The prediction model produced by XGBoost has capabilities above the average of other machine learning methods.


Sign in / Sign up

Export Citation Format

Share Document