scholarly journals Searching combinatorial optimality using graph-based homology information

2015 ◽  
Vol 26 (1-2) ◽  
pp. 103-120 ◽  
Author(s):  
Pedro Real ◽  
Helena Molina-Abril ◽  
Aldo Gonzalez-Lorenzo ◽  
Alexandra Bac ◽  
Jean-Luc Mari
Keyword(s):  
2015 ◽  
Vol 11 (10) ◽  
pp. 5045-5051 ◽  
Author(s):  
Alfredo Iacoangeli ◽  
Paolo Marcatili ◽  
Anna Tramontano

2010 ◽  
Vol 7 (3) ◽  
pp. 275-289 ◽  
Author(s):  
Vesna Memišević ◽  
Tijana Milenković ◽  
Nataša Pržulj

Summary Traditional approaches for homology detection rely on finding sufficient similarities between protein sequences. Motivated by studies demonstrating that from non-sequence based sources of biological information, such as the secondary or tertiary molecular structure, we can extract certain types of biological knowledge when sequence-based approaches fail, we hypothesize that protein-protein interaction (PPI) network topology and protein sequence might give insights into different slices of biological information. Since proteins aggregate to perform a function instead of acting in isolation, analyzing complex wirings around a protein in a PPI network could give deeper insights into the protein’s role in the inner working of the cell than analyzing sequences of individual genes. Hence, we believe that one could lose much information by focusing on sequence information alone. We examine whether the information about homologous proteins captured by PPI network topology differs and to what extent from the information captured by their sequences. We measure how similar the topology around homologous proteins in a PPI network is and show that such proteins have statistically significantly higher network similarity than nonhomologous proteins. We compare these network similarity trends of homologous proteins with the trends in their sequence identity and find that network similarities uncover almost as much homology as sequence identities. Although none of the two methods, network topology and sequence identity, seems to capture homology information in its entirety, we demonstrate that the two might give insights into somewhat different types of biological information, as the overlap of the homology information that they uncover is relatively low. Therefore, we conclude that similarities of proteins’ topological neighborhoods in a PPI network could be used as a complementary method to sequence-based approaches for identifying homologs, as well as for analyzing evolutionary distance and functional divergence of homologous proteins.


2014 ◽  
Vol 15 (1) ◽  
Author(s):  
Ian Reid ◽  
Nicholas O’Toole ◽  
Omar Zabaneh ◽  
Reza Nourzadeh ◽  
Mahmoud Dahdouli ◽  
...  

2019 ◽  
Vol 36 (8) ◽  
pp. 2575-2577
Author(s):  
Swadha Anand ◽  
Bhusan K Kuntal ◽  
Anwesha Mohapatra ◽  
Vineet Bhatt ◽  
Sharmila S Mande

Abstract Motivation Functional potential of genomes and metagenomes which are inferred using homology-based methods are often subjected to certain limitations, especially for proteins with homologs which function in multiple pathways. Augmenting the homology information with genomic location of the constituent genes can significantly improve the accuracy of estimated functions. This can help in distinguishing cognate homolog belonging to a candidate pathway from its other homologs functional in different pathways. Results In this article, we present a web-based analysis platform ‘FunGeCo’ to enable gene-context-based functional inference for microbial genomes and metagenomes. It is expected to be a valuable resource and complement the existing tools for understanding the functional potential of microbes which reside in an environment. Availability and implementation https://web.rniapps.net/fungeco [Freely available for academic use]. Supplementary information Supplementary data are available at Bioinformatics online.


2017 ◽  
Author(s):  
Ronghui You ◽  
Zihan Zhang ◽  
Yi Xiong ◽  
Fengzhu Sun ◽  
Hiroshi Mamitsuka ◽  
...  

AbstractMotivation: Gene Ontology (GO) has been widely used to annotate functions of proteins and understand their biological roles. Currently only ¡1% of more than 70 million proteins in UniProtKB have experimental GO annotations, implying the strong necessity of automated function prediction (AFP) of proteins, where AFP is a hard multi-label classification problem due to one protein with a diverse number of GO terms. Most of these proteins have only sequences as input information, indicating the importance of sequence-based AFP (SAFP: sequences are the only input). Furthermore, homology-based SAFP tools are competitive in AFP competitions, while they do not necessarily work well for so-calleddifficultproteins, which have ¡60% sequence identity to proteins with annotations already. Thus, the vital and challenging problem now is to develop a method for SAFP, particularly for difficult proteins.Methods: The key of this method is to extract not only homology information but also diverse, deep-rooted information/evidence from sequence inputs and integrate them into a predictor in an efficient and also effective manner. We propose GOLabeler, which integrates five component classifiers, trained from different features, including GO term frequency, sequence alignment, amino acid trigram, domains and motifs, and biophysical properties, etc., in the framework of learning to rank (LTR), a new paradigm of machine learning, especially powerful for multi-label classification.Results: The empirical results obtained by examining GOLabeler extensively and thoroughly by using large-scale datasets revealed numerous favorable aspects of GOLabeler, including significant performance advantage over state-of-the-art AFP methods.Contact:[email protected]


2021 ◽  
Author(s):  
Panagiotis I Koukos ◽  
Manon F. Reau ◽  
Alexandre M.J.J. Bonvin

Small molecule docking remains one of the most valuable computational techniques for the structure prediction of protein-small molecule complexes. It allows us to study the interactions between compounds and the protein receptors they target at atomic detail, in a timely and efficient manner. Here we present a new protocol in HADDOCK, our integrative modelling platform, which incorporates homology information for both receptor and compounds. It makes use of HADDOCK's unique ability to integrate information in the simulation to drive it toward conformations which agree with the provided data. The focal point is the use of shape restraints derived from homologous compounds bound to the target receptors. We have developed two protocols: In the first, the shape is composed of fake atom beads based on the position of the heavy atoms of the homologous template compound, whereas in the second the shape is additionally annotated with pharmacophore data, for some or all beads. For both protocols, ambiguous distance restraints are subsequently defined between those beads and the heavy atoms of the ligand to be docked. We have benchmarked the performance of these protocols with a fully unbound version of the widely used DUD-E dataset. In this unbound docking scenario, our template/shape-based docking protocol reaches an overall success rate of 81% on 99 complexes, which is close to the best results reported for bound docking on the DUD-E dataset.


2014 ◽  
Vol 2014 ◽  
pp. 1-12 ◽  
Author(s):  
Chih-Hao Lu ◽  
Chin-Sheng Yu ◽  
Yu-Tung Chien ◽  
Shao-Wei Huang

We propose a method (EXIA2) of catalytic residue prediction based on protein structure without needing homology information. The method is based on the special side chain orientation of catalytic residues. We found that the side chain of catalytic residues usually points to the center of the catalytic site. The special orientation is usually observed in catalytic residues but not in noncatalytic residues, which usually have random side chain orientation. The method is shown to be the most accurate catalytic residue prediction method currently when combined with PSI-Blast sequence conservation. It performs better than other competing methods on several benchmark datasets that include over 1,200 enzyme structures. The areas under the ROC curve (AUC) on these benchmark datasets are in the range from 0.934 to 0.968.


Sign in / Sign up

Export Citation Format

Share Document