scholarly journals Transforming the Language of Life: Transformer Neural Networks for Protein Prediction Tasks

Author(s):  
Ananthan Nambiar ◽  
Simon Liu ◽  
Mark Hopkins ◽  
Maeve Heflin ◽  
Sergei Maslov ◽  
...  

AbstractThe scientific community is rapidly generating protein sequence information, but only a fraction of these proteins can be experimentally characterized. While promising deep learning approaches for protein prediction tasks have emerged, they have computational limitations or are designed to solve a specific task. We present a Transformer neural network that pre-trains task-agnostic sequence representations. This model is fine-tuned to solve two different protein prediction tasks: protein family classification and protein interaction prediction. Our method is comparable to existing state-of-the art approaches for protein family classification, while being much more general than other architectures. Further, our method outperforms all other approaches for protein interaction prediction. These results offer a promising framework for fine-tuning the pre-trained sequence representations for other protein prediction tasks.

2020 ◽  
Vol 36 (16) ◽  
pp. 4406-4414 ◽  
Author(s):  
Lifan Chen ◽  
Xiaoqin Tan ◽  
Dingyan Wang ◽  
Feisheng Zhong ◽  
Xiaohong Liu ◽  
...  

Abstract Motivation Identifying compound–protein interaction (CPI) is a crucial task in drug discovery and chemogenomics studies, and proteins without three-dimensional structure account for a large part of potential biological targets, which requires developing methods using only protein sequence information to predict CPI. However, sequence-based CPI models may face some specific pitfalls, including using inappropriate datasets, hidden ligand bias and splitting datasets inappropriately, resulting in overestimation of their prediction performance. Results To address these issues, we here constructed new datasets specific for CPI prediction, proposed a novel transformer neural network named TransformerCPI, and introduced a more rigorous label reversal experiment to test whether a model learns true interaction features. TransformerCPI achieved much improved performance on the new experiments, and it can be deconvolved to highlight important interacting regions of protein sequences and compound atoms, which may contribute chemical biology studies with useful guidance for further ligand structural optimization. Availability and implementation https://github.com/lifanchen-simm/transformerCPI.


2019 ◽  
Vol 35 (14) ◽  
pp. i305-i314 ◽  
Author(s):  
Muhao Chen ◽  
Chelsea J -T Ju ◽  
Guangyu Zhou ◽  
Xuelu Chen ◽  
Tianran Zhang ◽  
...  

AbstractMotivationSequence-based protein–protein interaction (PPI) prediction represents a fundamental computational biology problem. To address this problem, extensive research efforts have been made to extract predefined features from the sequences. Based on these features, statistical algorithms are learned to classify the PPIs. However, such explicit features are usually costly to extract, and typically have limited coverage on the PPI information.ResultsWe present an end-to-end framework, PIPR (Protein–Protein Interaction Prediction Based on Siamese Residual RCNN), for PPI predictions using only the protein sequences. PIPR incorporates a deep residual recurrent convolutional neural network in the Siamese architecture, which leverages both robust local features and contextualized information, which are significant for capturing the mutual influence of proteins sequences. PIPR relieves the data pre-processing efforts that are required by other systems, and generalizes well to different application scenarios. Experimental evaluations show that PIPR outperforms various state-of-the-art systems on the binary PPI prediction problem. Moreover, it shows a promising performance on more challenging problems of interaction type prediction and binding affinity estimation, where existing approaches fall short.Availability and implementationThe implementation is available at https://github.com/muhaochen/seq_ppi.git.Supplementary informationSupplementary data are available at Bioinformatics online.


2018 ◽  
Author(s):  
Muhao Chen ◽  
Chelsea Jui-Ting Ju ◽  
Guangyu Zhou ◽  
Tianran Zhang ◽  
Xuelu Chen ◽  
...  

Sequence-based protein-protein interaction (PPI) prediction represents a fundamental computational biology problem. To address this problem, extensive research efforts have been made to extract predefined features from the sequences. Based on these features, statistical algorithms are learned to classify the PPIs. However, such explicit features are usually costly to extract, and typically have limited coverage on the PPI information. Hence, we present an end-to-end framework, Lasagna, for PPI predictions using only the primary sequences of a protein pair. Lasagna incorporates a deep residual recurrent convolutional neural network in the Siamese learning architecture, which leverages both robust local features and contextualized information that are significant for capturing the mutual influence of protein sequences. Our framework relieves the data pre-processing efforts that are required by other systems, and generalizes well to different application scenarios. Experimental evaluations show that Lasagna outperforms various state-of-the-art systems on the binary PPI prediction problem. Moreover, it shows a promising performance on more challenging problems of interaction type prediction and binding affinity estimation, where existing approaches fall short.


2010 ◽  
Vol 20 (1) ◽  
pp. 37-45
Author(s):  
Mohammad Shoyaib ◽  
M. Abdullah-Al-Wadud ◽  
Syed Murtuza Baker ◽  
Mohammad Nurul Islam ◽  
Oksam Chae

An improved computational approach which implements a protein-protein interaction prediction system based on the sequence information of a protein has been presented. A Support Vector Machine (SVM) is trained with this sequence information to predict the interactions. This interaction prediction technique exhibits 79.81% accuracy over a wide range of data, which is a significant improvement over other conventional computational protein-protein interaction prediction methods. Key words: Protein-protein interaction, Amino acid sequence, Computational approach D.O.I. 10.3329/ptcb.v20i1.5963 Plant Tissue Cult. & Biotech. 20(1): 37-45, 2010 (June)  


Sign in / Sign up

Export Citation Format

Share Document