DNN-Dom: predicting protein domain boundary from sequence alone by deep neural network

2019 ◽  
Vol 35 (24) ◽  
pp. 5128-5136 ◽  
Author(s):  
Qiang Shi ◽  
Weiya Chen ◽  
Siqi Huang ◽  
Fanglin Jin ◽  
Yinghao Dong ◽  
...  

Abstract Motivation Accurate delineation of protein domain boundary plays an important role for protein engineering and structure prediction. Although machine-learning methods are widely used to predict domain boundary, these approaches often ignore long-range interactions among residues, which have been proven to improve the prediction performance. However, how to simultaneously model the local and global interactions to further improve domain boundary prediction is still a challenging problem. Results This article employs a hybrid deep learning method that combines convolutional neural network and gate recurrent units’ models for domain boundary prediction. It not only captures the local and non-local interactions, but also fuses these features for prediction. Additionally, we adopt balanced Random Forest for classification to deal with high imbalance of samples and high dimensions of deep features. Experimental results show that our proposed approach (DNN-Dom) outperforms existing machine-learning-based methods for boundary prediction. We expect that DNN-Dom can be useful for assisting protein structure and function prediction. Availability and implementation The method is available as DNN-Dom Server at http://isyslab.info/DNN-Dom/. Supplementary information Supplementary data are available at Bioinformatics online.

2018 ◽  
Vol 35 (14) ◽  
pp. 2411-2417 ◽  
Author(s):  
Seung Hwan Hong ◽  
Keehyoung Joo ◽  
Jooyoung Lee

AbstractMotivationDomain boundary prediction is one of the most important problems in the study of protein structure and function. Many sequence-based domain boundary prediction methods are either template-based or machine learning (ML) based. ML-based methods often perform poorly due to their use of only local (i.e. short-range) features. These conventional features such as sequence profiles, secondary structures and solvent accessibilities are typically restricted to be within 20 residues of the domain boundary candidate.ResultsTo address the performance of ML-based methods, we developed a new protein domain boundary prediction method (ConDo) that utilizes novel long-range features such as coevolutionary information in addition to the aforementioned local window features as inputs for ML. Toward this purpose, two types of coevolutionary information were extracted from multiple sequence alignment using direct coupling analysis: (i) partially aligned sequences, and (ii) correlated mutation information. Both the partially aligned sequence information and the modularity of residue–residue couplings possess long-range correlation information.Availability and implementationhttps://github.com/gicsaw/ConDo.gitSupplementary informationSupplementary data are available at Bioinformatics online.


2020 ◽  
Vol 36 (12) ◽  
pp. 3749-3757 ◽  
Author(s):  
Wei Zheng ◽  
Xiaogen Zhou ◽  
Qiqige Wuyun ◽  
Robin Pearce ◽  
Yang Li ◽  
...  

Abstract Motivation Protein domains are subunits that can fold and function independently. Correct domain boundary assignment is thus a critical step toward accurate protein structure and function analyses. There is, however, no efficient algorithm available for accurate domain prediction from sequence. The problem is particularly challenging for proteins with discontinuous domains, which consist of domain segments that are separated along the sequence. Results We developed a new algorithm, FUpred, which predicts protein domain boundaries utilizing contact maps created by deep residual neural networks coupled with coevolutionary precision matrices. The core idea of the algorithm is to retrieve domain boundary locations by maximizing the number of intra-domain contacts, while minimizing the number of inter-domain contacts from the contact maps. FUpred was tested on a large-scale dataset consisting of 2549 proteins and generated correct single- and multi-domain classifications with a Matthew’s correlation coefficient of 0.799, which was 19.1% (or 5.3%) higher than the best machine learning (or threading)-based method. For proteins with discontinuous domains, the domain boundary detection and normalized domain overlapping scores of FUpred were 0.788 and 0.521, respectively, which were 17.3% and 23.8% higher than the best control method. The results demonstrate a new avenue to accurately detect domain composition from sequence alone, especially for discontinuous, multi-domain proteins. Availability and implementation https://zhanglab.ccmb.med.umich.edu/FUpred. Supplementary information Supplementary data are available at Bioinformatics online.


2019 ◽  
Vol 36 (1) ◽  
pp. 104-111
Author(s):  
Shuichiro Makigaki ◽  
Takashi Ishida

Abstract Motivation Template-based modeling, the process of predicting the tertiary structure of a protein by using homologous protein structures, is useful if good templates can be found. Although modern homology detection methods can find remote homologs with high sensitivity, the accuracy of template-based models generated from homology-detection-based alignments is often lower than that from ideal alignments. Results In this study, we propose a new method that generates pairwise sequence alignments for more accurate template-based modeling. The proposed method trains a machine learning model using the structural alignment of known homologs. It is difficult to directly predict sequence alignments using machine learning. Thus, when calculating sequence alignments, instead of a fixed substitution matrix, this method dynamically predicts a substitution score from the trained model. We evaluate our method by carefully splitting the training and test datasets and comparing the predicted structure’s accuracy with that of state-of-the-art methods. Our method generates more accurate tertiary structure models than those produced from alignments obtained by other methods. Availability and implementation https://github.com/shuichiro-makigaki/exmachina. Supplementary information Supplementary data are available at Bioinformatics online.


Author(s):  
Scott M. Woodley ◽  
Graeme M. Day ◽  
R. Catlow

We review the current techniques used in the prediction of crystal structures and their surfaces and of the structures of nanoparticles. The main classes of search algorithm and energy function are summarized, and we discuss the growing role of methods based on machine learning. We illustrate the current status of the field with examples taken from metallic, inorganic and organic systems. This article is part of a discussion meeting issue ‘Dynamic in situ microscopy relating structure and function’.


2020 ◽  
Vol 36 (11) ◽  
pp. 3385-3392
Author(s):  
Zi-Lin Liu ◽  
Jing-Hao Hu ◽  
Fan Jiang ◽  
Yun-Dong Wu

Abstract Motivation High-throughput sequencing discovers many naturally occurring disulfide-rich peptides or cystine-rich peptides (CRPs) with diversified bioactivities. However, their structure information, which is very important to peptide drug discovery, is still very limited. Results We have developed a CRP-specific structure prediction method called Cystine-Rich peptide Structure Prediction (CRiSP), based on a customized template database with cystine-specific sequence alignment and three machine-learning predictors. The modeling accuracy is significantly better than several popular general-purpose structure modeling methods, and our CRiSP can provide useful model quality estimations. Availability and implementation The CRiSP server is freely available on the website at http://wulab.com.cn/CRISP. Contact [email protected] or [email protected] Supplementary information Supplementary data are available at Bioinformatics online.


2008 ◽  
Vol 9 (S1) ◽  
Author(s):  
Paul D Yoo ◽  
Abdur R Sikder ◽  
Bing Bing Zhou ◽  
Albert Y Zomaya

2008 ◽  
Vol 7 (2) ◽  
pp. 172-181 ◽  
Author(s):  
Paul D. Yoo ◽  
Abdur R. Sikder ◽  
Javid Taheri ◽  
Bing Bing Zhou ◽  
Albert Y. Zomaya

Sign in / Sign up

Export Citation Format

Share Document