Deep learning with feature embedding for compound-protein interaction prediction

AbstractAccurately identifying compound-protein interactions in silico can deepen our understanding of the mechanisms of drug action and significantly facilitate the drug discovery and development process. Traditional similarity-based computational models for compound-protein interaction prediction rarely exploit the latent features from current available large-scale unlabelled compound and protein data, and often limit their usage on relatively small-scale datasets. We propose a new scheme that combines feature embedding (a technique of representation learning) with deep learning for predicting compound-protein interactions. Our method automatically learns the low-dimensional implicit but expressive features for compounds and proteins from the massive amount of unlabelled data. Combining effective feature embedding with powerful deep learning techniques, our method provides a general computational pipeline for accurate compound-protein interaction prediction, even when the interaction knowledge of compounds and proteins is entirely unknown. Evaluations on current large-scale databases of the measured compound-protein affinities, such as ChEMBL and BindingDB, as well as known drug-target interactions from DrugBank have demonstrated the superior prediction performance of our method, and suggested that it can offer a useful tool for drug development and drug repositioning.

Download Full-text

Yuel: Compound-Protein Interaction Prediction with High Generalizability

10.1101/2021.07.06.451043 ◽

2021 ◽

Author(s):

Jian Wang ◽

Nikolay V Dokholyan

Keyword(s):

Deep Learning ◽

Virtual Screening ◽

Small Molecules ◽

Protein Interaction ◽

Protein Interactions ◽

Data Sets ◽

Interaction Prediction ◽

Learning Techniques ◽

Protein Interaction Prediction ◽

High Prediction

In recent years, numerous structure-free deep-learning-based neural networks have emerged aiming to predict compound-protein interactions for drug virtual screening. Although these methods show high prediction accuracy in their own tests, we find that they are not generalizable to predict interactions between unknown proteins and unknown small molecules, thus hindering the utilization of state-of-the-art deep learning techniques in the field of virtual screening. In our work, we develop a compound-protein interaction predictor, YueL, which can predict compound-protein interactions with high generalizability. Upon comprehensive tests on various data sets, we find that YueL has the ability to predict interactions between unknown compounds and unknown proteins. We anticipate our work can motivate broad application of deep learning techniques for drug virtual screening to supersede the traditional docking and cheminformatics methods.

Download Full-text

A deep-learning framework for multi-level peptide–protein interaction prediction

Nature Communications ◽

10.1038/s41467-021-25772-4 ◽

2021 ◽

Vol 12 (1) ◽

Author(s):

Yipin Lei ◽

Shuya Li ◽

Ziyi Liu ◽

Fangping Wan ◽

Tingzhong Tian ◽

...

Keyword(s):

Deep Learning ◽

Protein Interaction ◽

Protein Interactions ◽

Comprehensive Evaluation ◽

Interaction Prediction ◽

Learning Framework ◽

Binding Residue ◽

Protein Interaction Prediction ◽

Binding Residues ◽

Multi Level

AbstractPeptide-protein interactions are involved in various fundamental cellular functions and their identification is crucial for designing efficacious peptide therapeutics. Recently, a number of computational methods have been developed to predict peptide-protein interactions. However, most of the existing prediction approaches heavily depend on high-resolution structure data. Here, we present a deep learning framework for multi-level peptide-protein interaction prediction, called CAMP, including binary peptide-protein interaction prediction and corresponding peptide binding residue identification. Comprehensive evaluation demonstrated that CAMP can successfully capture the binary interactions between peptides and proteins and identify the binding residues along the peptides involved in the interactions. In addition, CAMP outperformed other state-of-the-art methods on binary peptide-protein interaction prediction. CAMP can serve as a useful tool in peptide-protein interaction prediction and identification of important binding residues in the peptides, which can thus facilitate the peptide drug discovery process.

Download Full-text

Boosting compound-protein interaction prediction by deep learning

Methods ◽

10.1016/j.ymeth.2016.06.024 ◽

2016 ◽

Vol 110 ◽

pp. 64-72 ◽

Cited By ~ 65

Author(s):

Kai Tian ◽

Mingyu Shao ◽

Yang Wang ◽

Jihong Guan ◽

Shuigeng Zhou

Keyword(s):

Deep Learning ◽

Protein Interaction ◽

Interaction Prediction ◽

Protein Interaction Prediction

Download Full-text

LPI-ETSLP: lncRNA–protein interaction prediction using eigenvalue transformation-based semi-supervised link prediction

Molecular BioSystems ◽

10.1039/c7mb00290d ◽

2017 ◽

Vol 13 (9) ◽

pp. 1781-1787 ◽

Cited By ~ 36

Author(s):

Huan Hu ◽

Chunyu Zhu ◽

Haixin Ai ◽

Li Zhang ◽

Jian Zhao ◽

...

Keyword(s):

Protein Interaction ◽

Protein Interactions ◽

Link Prediction ◽

Interaction Prediction ◽

Cellular Processes ◽

Protein Interaction Prediction

RNA–protein interactions are essential for understanding many important cellular processes.

Download Full-text

RAPPPID: Towards Generalisable Protein Interaction Prediction with AWD-LSTM Twin Networks

10.1101/2021.08.13.456309 ◽

2021 ◽

Author(s):

Joseph Szymborski ◽

Amin Emad

Keyword(s):

Protein Interaction ◽

Protein Interactions ◽

Information Leakage ◽

Supplementary Information ◽

Protein Protein Interactions ◽

Training Time ◽

Interaction Prediction ◽

Protein Protein Interaction ◽

Protein Interaction Prediction ◽

Time To Learn

Motivation: Computational methods for the prediction of protein-protein interactions, while important tools for researchers, are plagued by challenges in generalising to unseen proteins. Datasets used for modelling protein-protein predictions are particularly predisposed to information leakage and sampling biases. Results: In this study, we introduce RAPPPID, a method for the Regularised Automatic Prediction of Protein-Protein Interactions using Deep Learning. RAPPPID is a twin AWD-LSTM network which employs multiple regularisation methods during training time to learn generalised weights. Testing on stringent interaction datasets composed of proteins not seen during training, RAPPPID outperforms state-of-the-art methods. Further experiments show that RAPPPID's performance holds regardless of the particular proteins in the testing set and its performance is higher for biologically supported edges. This study serves to demonstrate that appropriate regularisation is an important component of overcoming the challenges of creating models for protein-protein interaction prediction that generalise to unseen proteins. Availability and Implementation: Code and datasets are freely available at https://github.com/jszym/rapppid. Contact: [email protected] Supplementary Information: Online-only supplementary data is available at the journal's website.

Download Full-text

A multitask transfer learning framework for the prediction of virus-human protein–protein interactions

BMC Bioinformatics ◽

10.1186/s12859-021-04484-y ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Thi Ngan Dong ◽

Graham Brogden ◽

Gisa Gerold ◽

Megha Khosla

Keyword(s):

Transfer Learning ◽

Protein Interaction ◽

Protein Interactions ◽

Protein Sequences ◽

Human Protein ◽

Interaction Patterns ◽

Protein Protein Interactions ◽

Interaction Prediction ◽

Protein Protein Interaction ◽

Protein Interaction Prediction

Abstract Background Viral infections are causing significant morbidity and mortality worldwide. Understanding the interaction patterns between a particular virus and human proteins plays a crucial role in unveiling the underlying mechanism of viral infection and pathogenesis. This could further help in prevention and treatment of virus-related diseases. However, the task of predicting protein–protein interactions between a new virus and human cells is extremely challenging due to scarce data on virus-human interactions and fast mutation rates of most viruses. Results We developed a multitask transfer learning approach that exploits the information of around 24 million protein sequences and the interaction patterns from the human interactome to counter the problem of small training datasets. Instead of using hand-crafted protein features, we utilize statistically rich protein representations learned by a deep language modeling approach from a massive source of protein sequences. Additionally, we employ an additional objective which aims to maximize the probability of observing human protein–protein interactions. This additional task objective acts as a regularizer and also allows to incorporate domain knowledge to inform the virus-human protein–protein interaction prediction model. Conclusions Our approach achieved competitive results on 13 benchmark datasets and the case study for the SARS-CoV-2 virus receptor. Experimental results show that our proposed model works effectively for both virus-human and bacteria-human protein–protein interaction prediction tasks. We share our code for reproducibility and future research at https://git.l3s.uni-hannover.de/dong/multitask-transfer.

Download Full-text

Amino-Acid Residue Association Models for Large Scale Protein-Protein Interaction Prediction

In Silico Biology ◽

10.3233/isb-2009-0397 ◽

2009 ◽

Vol 9 (4) ◽

pp. 179-194 ◽

Cited By ~ 2

Author(s):

Raghuraj Rao ◽

Kyaw Tun ◽

Yuko Makita ◽

Samavedham Lakshminarayanan ◽

Pawan K. Dhar

Keyword(s):

Amino Acid ◽

Amino Acid Residue ◽

Protein Interaction ◽

Large Scale ◽

Association Models ◽

Interaction Prediction ◽

Protein Protein Interaction ◽

Protein Interaction Prediction

Download Full-text

Learning Unknown from Correlations: Graph Neural Network for Inter-novel-protein Interaction Prediction

Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2021/506 ◽

2021 ◽

Author(s):

Guofeng Lv ◽

Zhiqiang Hu ◽

Yanguang Bi ◽

Shaoting Zhang

Keyword(s):

Neural Network ◽

Protein Interaction ◽

Protein Interactions ◽

Poor Performance ◽

Evaluation Framework ◽

Interaction Prediction ◽

Protein Protein Interaction ◽

Protein Interaction Prediction ◽

Real World Datasets ◽

Novel Protein

The study of multi-type Protein-Protein Interaction (PPI) is fundamental for understanding biological processes from a systematic perspective and revealing disease mechanisms. Existing methods suffer from significant performance degradation when tested in unseen dataset. In this paper, we investigate the problem and find that it is mainly attributed to the poor performance for inter-novel-protein interaction prediction. However, current evaluations overlook the inter-novel-protein interactions, and thus fail to give an instructive assessment. As a result, we propose to address the problem from both the evaluation and the methodology. Firstly, we design a new evaluation framework that fully respects the inter-novel-protein interactions and gives consistent assessment across datasets. Secondly, we argue that correlations between proteins must provide useful information for analysis of novel proteins, and based on this, we propose a graph neural network based method (GNN-PPI) for better inter-novel-protein interaction prediction. Experimental results on real-world datasets of different scales demonstrate that GNN-PPI significantly outperforms state-of-the-art PPI prediction methods, especially for the inter-novel-protein interaction prediction.

Download Full-text