Searching combinatorial optimality using graph-based homology information

Summary Traditional approaches for homology detection rely on finding sufficient similarities between protein sequences. Motivated by studies demonstrating that from non-sequence based sources of biological information, such as the secondary or tertiary molecular structure, we can extract certain types of biological knowledge when sequence-based approaches fail, we hypothesize that protein-protein interaction (PPI) network topology and protein sequence might give insights into different slices of biological information. Since proteins aggregate to perform a function instead of acting in isolation, analyzing complex wirings around a protein in a PPI network could give deeper insights into the protein’s role in the inner working of the cell than analyzing sequences of individual genes. Hence, we believe that one could lose much information by focusing on sequence information alone. We examine whether the information about homologous proteins captured by PPI network topology differs and to what extent from the information captured by their sequences. We measure how similar the topology around homologous proteins in a PPI network is and show that such proteins have statistically significantly higher network similarity than nonhomologous proteins. We compare these network similarity trends of homologous proteins with the trends in their sequence identity and find that network similarities uncover almost as much homology as sequence identities. Although none of the two methods, network topology and sequence identity, seems to capture homology information in its entirety, we demonstrate that the two might give insights into somewhat different types of biological information, as the overlap of the homology information that they uncover is relatively low. Therefore, we conclude that similarities of proteins’ topological neighborhoods in a PPI network could be used as a complementary method to sequence-based approaches for identifying homologs, as well as for analyzing evolutionary distance and functional divergence of homologous proteins.

Download Full-text

SnowyOwl: accurate prediction of fungal genes by using RNA-Seq and homology information to select among ab initio models

BMC Bioinformatics ◽

10.1186/1471-2105-15-229 ◽

2014 ◽

Vol 15 (1) ◽

Cited By ~ 23

Author(s):

Ian Reid ◽

Nicholas O’Toole ◽

Omar Zabaneh ◽

Reza Nourzadeh ◽

Mahmoud Dahdouli ◽

...

Keyword(s):

Ab Initio ◽

Accurate Prediction ◽

Rna Seq ◽

Homology Information ◽

Fungal Genes ◽

Ab Initio Models

Download Full-text

FunGeCo: a web-based tool for estimation of functional potential of bacterial genomes and microbiomes using gene context information

Bioinformatics ◽

10.1093/bioinformatics/btz957 ◽

2019 ◽

Vol 36 (8) ◽

pp. 2575-2577

Author(s):

Swadha Anand ◽

Bhusan K Kuntal ◽

Anwesha Mohapatra ◽

Vineet Bhatt ◽

Sharmila S Mande

Keyword(s):

Supplementary Information ◽

Valuable Resource ◽

Bacterial Genomes ◽

Web Based ◽

Microbial Genomes ◽

Functional Potential ◽

Genomic Location ◽

Homology Information ◽

Functional Inference ◽

Analysis Platform

Abstract Motivation Functional potential of genomes and metagenomes which are inferred using homology-based methods are often subjected to certain limitations, especially for proteins with homologs which function in multiple pathways. Augmenting the homology information with genomic location of the constituent genes can significantly improve the accuracy of estimated functions. This can help in distinguishing cognate homolog belonging to a candidate pathway from its other homologs functional in different pathways. Results In this article, we present a web-based analysis platform ‘FunGeCo’ to enable gene-context-based functional inference for microbial genomes and metagenomes. It is expected to be a valuable resource and complement the existing tools for understanding the functional potential of microbes which reside in an environment. Availability and implementation https://web.rniapps.net/fungeco [Freely available for academic use]. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

GOLabeler: Improving Sequence-based Large-scale Protein Function Prediction by Learning to Rank

10.1101/145763 ◽

2017 ◽

Author(s):

Ronghui You ◽

Zihan Zhang ◽

Yi Xiong ◽

Fengzhu Sun ◽

Hiroshi Mamitsuka ◽

...

Keyword(s):

Protein Function ◽

Large Scale ◽

Protein Function Prediction ◽

Learning To Rank ◽

Classification Problem ◽

Function Prediction ◽

New Paradigm ◽

Effective Manner ◽

Homology Information ◽

Significant Performance

AbstractMotivation: Gene Ontology (GO) has been widely used to annotate functions of proteins and understand their biological roles. Currently only ¡1% of more than 70 million proteins in UniProtKB have experimental GO annotations, implying the strong necessity of automated function prediction (AFP) of proteins, where AFP is a hard multi-label classification problem due to one protein with a diverse number of GO terms. Most of these proteins have only sequences as input information, indicating the importance of sequence-based AFP (SAFP: sequences are the only input). Furthermore, homology-based SAFP tools are competitive in AFP competitions, while they do not necessarily work well for so-calleddifficultproteins, which have ¡60% sequence identity to proteins with annotations already. Thus, the vital and challenging problem now is to develop a method for SAFP, particularly for difficult proteins.Methods: The key of this method is to extract not only homology information but also diverse, deep-rooted information/evidence from sequence inputs and integrate them into a predictor in an efficient and also effective manner. We propose GOLabeler, which integrates five component classifiers, trained from different features, including GO term frequency, sequence alignment, amino acid trigram, domains and motifs, and biophysical properties, etc., in the framework of learning to rank (LTR), a new paradigm of machine learning, especially powerful for multi-label classification.Results: The empirical results obtained by examining GOLabeler extensively and thoroughly by using large-scale datasets revealed numerous favorable aspects of GOLabeler, including significant performance advantage over state-of-the-art AFP methods.Contact:[email protected]

Download Full-text

Shape-restrained modelling of protein-small molecule complexes with HADDOCK

10.1101/2021.06.10.447890 ◽

2021 ◽

Author(s):

Panagiotis I Koukos ◽

Manon F. Reau ◽

Alexandre M.J.J. Bonvin

Keyword(s):

Small Molecule ◽

Structure Prediction ◽

Focal Point ◽

Computational Techniques ◽

Efficient Manner ◽

Heavy Atoms ◽

Homologous Compounds ◽

Protein Receptors ◽

Homology Information ◽

Molecule Docking

Small molecule docking remains one of the most valuable computational techniques for the structure prediction of protein-small molecule complexes. It allows us to study the interactions between compounds and the protein receptors they target at atomic detail, in a timely and efficient manner. Here we present a new protocol in HADDOCK, our integrative modelling platform, which incorporates homology information for both receptor and compounds. It makes use of HADDOCK's unique ability to integrate information in the simulation to drive it toward conformations which agree with the provided data. The focal point is the use of shape restraints derived from homologous compounds bound to the target receptors. We have developed two protocols: In the first, the shape is composed of fake atom beads based on the position of the heavy atoms of the homologous template compound, whereas in the second the shape is additionally annotated with pharmacophore data, for some or all beads. For both protocols, ambiguous distance restraints are subsequently defined between those beads and the heavy atoms of the ligand to be docked. We have benchmarked the performance of these protocols with a fully unbound version of the widely used DUD-E dataset. In this unbound docking scenario, our template/shape-based docking protocol reaches an overall success rate of 81% on 99 complexes, which is close to the best results reported for bound docking on the DUD-E dataset.

Download Full-text

EXIA2: Web Server of Accurate and Rapid Protein Catalytic Residue Prediction

BioMed Research International ◽

10.1155/2014/807839 ◽

2014 ◽

Vol 2014 ◽

pp. 1-12 ◽

Cited By ~ 1

Author(s):

Chih-Hao Lu ◽

Chin-Sheng Yu ◽

Yu-Tung Chien ◽

Shao-Wei Huang

Keyword(s):

Prediction Method ◽

Side Chain ◽

Catalytic Residue ◽

Chain Orientation ◽

Catalytic Residues ◽

Special Orientation ◽

Homology Information ◽

Benchmark Datasets ◽

Side Chain Orientation ◽

Better Than

We propose a method (EXIA2) of catalytic residue prediction based on protein structure without needing homology information. The method is based on the special side chain orientation of catalytic residues. We found that the side chain of catalytic residues usually points to the center of the catalytic site. The special orientation is usually observed in catalytic residues but not in noncatalytic residues, which usually have random side chain orientation. The method is shown to be the most accurate catalytic residue prediction method currently when combined with PSI-Blast sequence conservation. It performs better than other competing methods on several benchmark datasets that include over 1,200 enzyme structures. The areas under the ROC curve (AUC) on these benchmark datasets are in the range from 0.934 to 0.968.

Download Full-text