ConDo: protein domain boundary prediction using coevolutionary information

Seung Hwan Hong; Keehyoung Joo; Jooyoung Lee

doi:10.1093/bioinformatics/bty973

ConDo: protein domain boundary prediction using coevolutionary information

Bioinformatics ◽

10.1093/bioinformatics/bty973 ◽

2018 ◽

Vol 35 (14) ◽

pp. 2411-2417 ◽

Cited By ~ 2

Author(s):

Seung Hwan Hong ◽

Keehyoung Joo ◽

Jooyoung Lee

Keyword(s):

Long Range ◽

Domain Boundary ◽

Prediction Method ◽

Protein Domain ◽

Supplementary Information ◽

Sequence Information ◽

Multiple Sequence ◽

Domain Boundary Prediction ◽

Correlation Information ◽

And Function

AbstractMotivationDomain boundary prediction is one of the most important problems in the study of protein structure and function. Many sequence-based domain boundary prediction methods are either template-based or machine learning (ML) based. ML-based methods often perform poorly due to their use of only local (i.e. short-range) features. These conventional features such as sequence profiles, secondary structures and solvent accessibilities are typically restricted to be within 20 residues of the domain boundary candidate.ResultsTo address the performance of ML-based methods, we developed a new protein domain boundary prediction method (ConDo) that utilizes novel long-range features such as coevolutionary information in addition to the aforementioned local window features as inputs for ML. Toward this purpose, two types of coevolutionary information were extracted from multiple sequence alignment using direct coupling analysis: (i) partially aligned sequences, and (ii) correlated mutation information. Both the partially aligned sequence information and the modularity of residue–residue couplings possess long-range correlation information.Availability and implementationhttps://github.com/gicsaw/ConDo.gitSupplementary informationSupplementary data are available at Bioinformatics online.

Download Full-text

DNN-Dom: predicting protein domain boundary from sequence alone by deep neural network

Bioinformatics ◽

10.1093/bioinformatics/btz464 ◽

2019 ◽

Vol 35 (24) ◽

pp. 5128-5136 ◽

Cited By ~ 3

Author(s):

Qiang Shi ◽

Weiya Chen ◽

Siqi Huang ◽

Fanglin Jin ◽

Yinghao Dong ◽

...

Keyword(s):

Neural Network ◽

Machine Learning ◽

Structure Prediction ◽

Domain Boundary ◽

Protein Domain ◽

Supplementary Information ◽

High Dimensions ◽

Long Range Interactions ◽

Domain Boundary Prediction ◽

And Function

Abstract Motivation Accurate delineation of protein domain boundary plays an important role for protein engineering and structure prediction. Although machine-learning methods are widely used to predict domain boundary, these approaches often ignore long-range interactions among residues, which have been proven to improve the prediction performance. However, how to simultaneously model the local and global interactions to further improve domain boundary prediction is still a challenging problem. Results This article employs a hybrid deep learning method that combines convolutional neural network and gate recurrent units’ models for domain boundary prediction. It not only captures the local and non-local interactions, but also fuses these features for prediction. Additionally, we adopt balanced Random Forest for classification to deal with high imbalance of samples and high dimensions of deep features. Experimental results show that our proposed approach (DNN-Dom) outperforms existing machine-learning-based methods for boundary prediction. We expect that DNN-Dom can be useful for assisting protein structure and function prediction. Availability and implementation The method is available as DNN-Dom Server at http://isyslab.info/DNN-Dom/. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

FUpred: detecting protein domains through deep-learning-based contact map prediction

Bioinformatics ◽

10.1093/bioinformatics/btaa217 ◽

2020 ◽

Vol 36 (12) ◽

pp. 3749-3757 ◽

Cited By ~ 1

Author(s):

Wei Zheng ◽

Xiaogen Zhou ◽

Qiqige Wuyun ◽

Robin Pearce ◽

Yang Li ◽

...

Keyword(s):

Large Scale ◽

Control Method ◽

Domain Boundary ◽

Protein Domains ◽

Protein Domain ◽

Supplementary Information ◽

Contact Maps ◽

Core Idea ◽

Matthew’S Correlation Coefficient ◽

And Function

Abstract Motivation Protein domains are subunits that can fold and function independently. Correct domain boundary assignment is thus a critical step toward accurate protein structure and function analyses. There is, however, no efficient algorithm available for accurate domain prediction from sequence. The problem is particularly challenging for proteins with discontinuous domains, which consist of domain segments that are separated along the sequence. Results We developed a new algorithm, FUpred, which predicts protein domain boundaries utilizing contact maps created by deep residual neural networks coupled with coevolutionary precision matrices. The core idea of the algorithm is to retrieve domain boundary locations by maximizing the number of intra-domain contacts, while minimizing the number of inter-domain contacts from the contact maps. FUpred was tested on a large-scale dataset consisting of 2549 proteins and generated correct single- and multi-domain classifications with a Matthew’s correlation coefficient of 0.799, which was 19.1% (or 5.3%) higher than the best machine learning (or threading)-based method. For proteins with discontinuous domains, the domain boundary detection and normalized domain overlapping scores of FUpred were 0.788 and 0.521, respectively, which were 17.3% and 23.8% higher than the best control method. The results demonstrate a new avenue to accurately detect domain composition from sequence alone, especially for discontinuous, multi-domain proteins. Availability and implementation https://zhanglab.ccmb.med.umich.edu/FUpred. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Improved general regression network for protein domain boundary prediction

BMC Bioinformatics ◽

10.1186/1471-2105-9-s1-s12 ◽

2008 ◽

Vol 9 (S1) ◽

Cited By ~ 10

Author(s):

Paul D Yoo ◽

Abdur R Sikder ◽

Bing Bing Zhou ◽

Albert Y Zomaya

Keyword(s):

Domain Boundary ◽

Protein Domain ◽

Domain Boundary Prediction ◽

General Regression

Download Full-text

DomNet: Protein Domain Boundary Prediction Using Enhanced General Regression Network and New Profiles

IEEE Transactions on NanoBioscience ◽

10.1109/tnb.2008.2000747 ◽

2008 ◽

Vol 7 (2) ◽

pp. 172-181 ◽

Cited By ~ 23

Author(s):

Paul D. Yoo ◽

Abdur R. Sikder ◽

Javid Taheri ◽

Bing Bing Zhou ◽

Albert Y. Zomaya

Keyword(s):

Domain Boundary ◽

Protein Domain ◽

Domain Boundary Prediction ◽

General Regression

Download Full-text

Protein Domain Boundary Prediction Using Multiple Protein Properties

Journal of Bionanoscience ◽

10.1166/jbns.2013.1095 ◽

2013 ◽

Vol 7 (1) ◽

pp. 104-109

Author(s):

Jiaxin Wang ◽

Jiafeng Wang ◽

Wei Du ◽

Chong Yu ◽

Yanchun Liang

Keyword(s):

Domain Boundary ◽

Protein Domain ◽

Multiple Protein ◽

Protein Properties ◽

Domain Boundary Prediction

Download Full-text

PDP-RF: Protein Domain Boundary Prediction Using Random Forest Classifier

Lecture Notes in Computer Science - Pattern Recognition and Machine Intelligence ◽

10.1007/978-3-319-19941-2_42 ◽

2015 ◽

pp. 441-450 ◽

Cited By ~ 2

Author(s):

Piyali Chatterjee ◽

Subhadip Basu ◽

Julian Zubek ◽

Mahantapas Kundu ◽

Mita Nasipuri ◽

...

Keyword(s):

Random Forest ◽

Domain Boundary ◽

Random Forest Classifier ◽

Protein Domain ◽

Domain Boundary Prediction

Download Full-text

Phylogenetic Assignment of the Fungicolous Hypoxylon invadens (Ascomycota, Xylariales) and Investigation of its Secondary Metabolites

Microorganisms ◽

10.3390/microorganisms8091397 ◽

2020 ◽

Vol 8 (9) ◽

pp. 1397 ◽

Cited By ~ 1

Author(s):

Kevin Becker ◽

Christopher Lambert ◽

Jörg Wieschhaus ◽

Marc Stadler

Keyword(s):

Secondary Metabolites ◽

Rna Polymerase Ii ◽

High Resolution Mass Spectrometry ◽

Large Subunit ◽

Submerged Cultivation ◽

Preparative Chromatography ◽

Sequence Information ◽

Internal Transcribed Spacer Region ◽

Multiple Sequence ◽

And Function

The ascomycete Hypoxylon invadens was described in 2014 as a fungicolous species growing on a member of its own genus, H.fragiforme, which is considered a rare lifestyle in the Hypoxylaceae. This renders H.invadens an interesting target in our efforts to find new bioactive secondary metabolites from members of the Xylariales. So far, only volatile organic compounds have been reported from H.invadens, but no investigation of non-volatile compounds had been conducted. Furthermore, a phylogenetic assignment following recent trends in fungal taxonomy via a multiple sequence alignment seemed practical. A culture of H.invadens was thus subjected to submerged cultivation to investigate the produced secondary metabolites, followed by isolation via preparative chromatography and subsequent structure elucidation by means of nuclear magnetic resonance (NMR) spectroscopy and high-resolution mass spectrometry (HR-MS). This approach led to the identification of the known flaviolin (1) and 3,3-biflaviolin (2) as the main components, which had never been reported from the order Xylariales before. Assessment of their antimicrobial and cytotoxic effects via a panel of commonly used microorganisms and cell lines in our laboratory did not yield any effects of relevance. Concurrently, genomic DNA from the fungus was used to construct a multigene phylogeny using ribosomal sequence information from the internal transcribed spacer region (ITS), the 28S large subunit of ribosomal DNA (LSU), and proteinogenic nucleotide sequences from the second largest subunit of the DNA-directed RNA polymerase II (RPB2) and β-tubulin (TUB2) genes. A placement in a newly formed clade with H.trugodes was strongly supported in a maximum-likelihood (ML) phylogeny using sequences derived from well characterized strains, but the exact position of said clade remains unclear. Both, the chemical and the phylogenetic results suggest further inquiries into the lifestyle of this unique fungus to get a better understanding of both, its ecological role and function of its produced secondary metabolites hitherto unique to the Xylariales.

Download Full-text

Sequence-based protein domain boundary prediction using BP neural network with various property profiles

Proteins Structure Function and Bioinformatics ◽

10.1002/prot.21745 ◽

2008 ◽

Vol 71 (1) ◽

pp. 300-307 ◽

Cited By ~ 7

Author(s):

Lei Ye ◽

Ting Liu ◽

Zhaohui Wu ◽

Ruhong Zhou

Keyword(s):

Neural Network ◽

Bp Neural Network ◽

Domain Boundary ◽

Protein Domain ◽

Domain Boundary Prediction

Download Full-text

Delineation of modular proteins: Domain boundary prediction from sequence information

Briefings in Bioinformatics ◽

10.1093/bib/5.2.179 ◽

2004 ◽

Vol 5 (2) ◽

pp. 179-192 ◽

Cited By ~ 18

Author(s):

L. Kong

Keyword(s):

Domain Boundary ◽

Sequence Information ◽

Modular Proteins ◽

Domain Boundary Prediction

Download Full-text

DomBpred: protein domain boundary predictor using inter-residue distance and domain-residue level clustering

10.1101/2021.11.19.469204 ◽

2021 ◽

Author(s):

Zhongze Yu ◽

Chunxiang Peng ◽

Jun Liu ◽

Biao Zhang ◽

Xiaogen Zhou ◽

...

Keyword(s):

Clustering Algorithm ◽

State Of The Art ◽

Domain Boundary ◽

Residue Level ◽

Single Domain ◽

Protein Domain ◽

Test Set ◽

Cut Points ◽

Domain Boundary Prediction ◽

Domain Protein

Domain boundary prediction is one of the most important problems in the study of protein structure and function, especially for large proteins. At present, most domain boundary prediction methods have low accuracy and limitations in dealing with multi-domain proteins. In this study, we develop a sequence-based protein domain boundary predictor, named DomBpred. In DomBpred, the input sequence is firstly classified as either a single-domain protein or a multi-domain protein through a designed effective sequence metric based on a constructed single-domain sequence library. For the multi-domain protein, a domain-residue level clustering algorithm inspired by Ising model is proposed to cluster the spatially close residues according inter-residue distance. The unclassified residues and the residues at the edge of the cluster are then tuned by the secondary structure to form potential cut points. Finally, a domain boundary scoring function is proposed to recursively evaluate the potential cut points to generate the domain boundary. DomBpred is tested on a large-scale test set of FUpred comprising 2549 proteins. Experimental results show that DomBpred better performs than the state-of-the-art methods in classifying whether protein sequences are composed by single or multiple domains, and the Matthew's correlation coefficient is 0.882. Moreover, on 849 multi-domain proteins, the domain boundary distance and normalised domain overlap scores of DomBpred are 0.523 and 0.824, respectively, which are 5.0% and 4.2% higher than those of the best comparison method, respectively. Comparison with other methods on the given test set shows that DomBpred outperforms most state-of-the-art sequence-based methods and even achieves better results than the top-level template-based method.

Download Full-text