TopDomain: Exhaustive Protein Domain Boundary Metaprediction Combining Multisource Information and Deep Learning

Author(s):  
Daniel Mulnaes ◽  
Pegah Golchin ◽  
Filip Koenig ◽  
Holger Gohlke
Author(s):  
Xiaoyong Pan ◽  
Jasper Zuallaert ◽  
Xi Wang ◽  
Hong-Bin Shen ◽  
Elda Posada Campos ◽  
...  

Abstract Motivation Genetically engineering food crops involves introducing proteins from other species into crop plant species or modifying already existing proteins with gene editing techniques. In addition, newly synthesized proteins can be used as therapeutic protein drugs against diseases. For both research and safety regulation purposes, being able to assess the potential toxicity of newly introduced/synthesized proteins is of high importance. Results In this study, we present ToxDL, a deep learning-based approach for in silico prediction of protein toxicity from sequence alone. ToxDL consists of (i) a module encompassing a convolutional neural network that has been designed to handle variable-length input sequences, (ii) a domain2vec module for generating protein domain embeddings and (iii) an output module that classifies proteins as toxic or non-toxic, using the outputs of the two aforementioned modules. Independent test results obtained for animal proteins and cross-species transferability results obtained for bacteria proteins indicate that ToxDL outperforms traditional homology-based approaches and state-of-the-art machine-learning techniques. Furthermore, through visualizations based on saliency maps, we are able to verify that the proposed network learns known toxic motifs. Moreover, the saliency maps allow for directed in silico modification of a sequence, thus making it possible to alter its predicted protein toxicity. Availability and implementation ToxDL is freely available at http://www.csbio.sjtu.edu.cn/bioinf/ToxDL/. The source code can be found at https://github.com/xypan1232/ToxDL. Supplementary information Supplementary data are available at Bioinformatics online.


PLoS ONE ◽  
2015 ◽  
Vol 10 (10) ◽  
pp. e0141541 ◽  
Author(s):  
Zhidong Xue ◽  
Richard Jang ◽  
Brandon Govindarajoo ◽  
Yichu Huang ◽  
Yan Wang

2008 ◽  
Vol 9 (S1) ◽  
Author(s):  
Paul D Yoo ◽  
Abdur R Sikder ◽  
Bing Bing Zhou ◽  
Albert Y Zomaya

2008 ◽  
Vol 7 (2) ◽  
pp. 172-181 ◽  
Author(s):  
Paul D. Yoo ◽  
Abdur R. Sikder ◽  
Javid Taheri ◽  
Bing Bing Zhou ◽  
Albert Y. Zomaya

2019 ◽  
Vol 35 (24) ◽  
pp. 5128-5136 ◽  
Author(s):  
Qiang Shi ◽  
Weiya Chen ◽  
Siqi Huang ◽  
Fanglin Jin ◽  
Yinghao Dong ◽  
...  

Abstract Motivation Accurate delineation of protein domain boundary plays an important role for protein engineering and structure prediction. Although machine-learning methods are widely used to predict domain boundary, these approaches often ignore long-range interactions among residues, which have been proven to improve the prediction performance. However, how to simultaneously model the local and global interactions to further improve domain boundary prediction is still a challenging problem. Results This article employs a hybrid deep learning method that combines convolutional neural network and gate recurrent units’ models for domain boundary prediction. It not only captures the local and non-local interactions, but also fuses these features for prediction. Additionally, we adopt balanced Random Forest for classification to deal with high imbalance of samples and high dimensions of deep features. Experimental results show that our proposed approach (DNN-Dom) outperforms existing machine-learning-based methods for boundary prediction. We expect that DNN-Dom can be useful for assisting protein structure and function prediction. Availability and implementation The method is available as DNN-Dom Server at http://isyslab.info/DNN-Dom/. Supplementary information Supplementary data are available at Bioinformatics online.


2020 ◽  
Vol 36 (12) ◽  
pp. 3749-3757 ◽  
Author(s):  
Wei Zheng ◽  
Xiaogen Zhou ◽  
Qiqige Wuyun ◽  
Robin Pearce ◽  
Yang Li ◽  
...  

Abstract Motivation Protein domains are subunits that can fold and function independently. Correct domain boundary assignment is thus a critical step toward accurate protein structure and function analyses. There is, however, no efficient algorithm available for accurate domain prediction from sequence. The problem is particularly challenging for proteins with discontinuous domains, which consist of domain segments that are separated along the sequence. Results We developed a new algorithm, FUpred, which predicts protein domain boundaries utilizing contact maps created by deep residual neural networks coupled with coevolutionary precision matrices. The core idea of the algorithm is to retrieve domain boundary locations by maximizing the number of intra-domain contacts, while minimizing the number of inter-domain contacts from the contact maps. FUpred was tested on a large-scale dataset consisting of 2549 proteins and generated correct single- and multi-domain classifications with a Matthew’s correlation coefficient of 0.799, which was 19.1% (or 5.3%) higher than the best machine learning (or threading)-based method. For proteins with discontinuous domains, the domain boundary detection and normalized domain overlapping scores of FUpred were 0.788 and 0.521, respectively, which were 17.3% and 23.8% higher than the best control method. The results demonstrate a new avenue to accurately detect domain composition from sequence alone, especially for discontinuous, multi-domain proteins. Availability and implementation https://zhanglab.ccmb.med.umich.edu/FUpred. Supplementary information Supplementary data are available at Bioinformatics online.


2013 ◽  
Vol 7 (1) ◽  
pp. 104-109
Author(s):  
Jiaxin Wang ◽  
Jiafeng Wang ◽  
Wei Du ◽  
Chong Yu ◽  
Yanchun Liang

Author(s):  
Piyali Chatterjee ◽  
Subhadip Basu ◽  
Julian Zubek ◽  
Mahantapas Kundu ◽  
Mita Nasipuri ◽  
...  

2009 ◽  
Vol 3 ◽  
pp. 1-8 ◽  
Author(s):  
Svetlana Kirillova ◽  
Suresh Kumar ◽  
Oliviero Carugo

Sign in / Sign up

Export Citation Format

Share Document