scholarly journals Impact of the Continuous Evolution of Gene Ontology on the Performance of Similarity Measures for Scoring Confidence of Protein Interactions

2020 ◽  
Vol 1 (6) ◽  
Author(s):  
Madhusudan Paul ◽  
Ashish Anand ◽  
Saptarshi Pyne
2018 ◽  
Author(s):  
Madhusudan Paul ◽  
Ashish Anand

AbstractThe large-scale protein-protein interaction (PPI) data has the potential to play a significant role in the endeavor of understanding cellular processes. However, the presence of a considerable fraction of false positives is a bottleneck in realizing this potential. There have been continuous efforts to utilize complementary resources for scoring confidence of PPIs in a manner that false positive interactions get a low confidence score. Gene Ontology (GO), a taxonomy of biological terms to represent the properties of gene products and their relations, has been widely used for this purpose. We utilize GO to introduce a new set of specificity measures: Relative Depth Specificity (RDS), Relative Node-based Specificity (RNS), and Relative Edge-based Specificity (RES), leading to a new family of similarity measures. We use these similarity measures to obtain a confidence score for each PPI. We evaluate the new measures using four different benchmarks. We show that all the three measures are quite effective. Notably, RNS and RES more effectively distinguish true PPIs from false positives than the existing alternatives. RES also shows a robust set-discriminating power and can be useful for protein functional clustering as well.


2011 ◽  
Vol 28 (1) ◽  
pp. 69-75 ◽  
Author(s):  
Stefan R. Maetschke ◽  
Martin Simonsen ◽  
Melissa J. Davis ◽  
Mark A. Ragan

Author(s):  
Rune Sætre ◽  
Amund Tveit ◽  
Martin Thorsen Ranang ◽  
Tonje S. Steigedal ◽  
Liv Thommesen ◽  
...  

2020 ◽  
Vol 2 (2) ◽  
Author(s):  
Aaron Ayllon-Benitez ◽  
Romain Bourqui ◽  
Patricia Thébault ◽  
Fleur Mougin

Abstract The revolution in new sequencing technologies is greatly leading to new understandings of the relations between genotype and phenotype. To interpret and analyze data that are grouped according to a phenotype of interest, methods based on statistical enrichment became a standard in biology. However, these methods synthesize the biological information by a priori selecting the over-represented terms and may suffer from focusing on the most studied genes that represent a limited coverage of annotated genes within a gene set. Semantic similarity measures have shown great results within the pairwise gene comparison by making advantage of the underlying structure of the Gene Ontology. We developed GSAn, a novel gene set annotation method that uses semantic similarity measures to synthesize a priori Gene Ontology annotation terms. The originality of our approach is to identify the best compromise between the number of retained annotation terms that has to be drastically reduced and the number of related genes that has to be as large as possible. Moreover, GSAn offers interactive visualization facilities dedicated to the multi-scale analysis of gene set annotations. Compared to enrichment analysis tools, GSAn has shown excellent results in terms of maximizing the gene coverage while minimizing the number of terms.


BMC Genomics ◽  
2009 ◽  
Vol 10 (1) ◽  
pp. 288 ◽  
Author(s):  
Stefanie De Bodt ◽  
Sebastian Proost ◽  
Klaas Vandepoele ◽  
Pierre Rouzé ◽  
Yves Van de Peer

Author(s):  
JAMES M. KELLER ◽  
JAMES C. BEZDEK ◽  
MIHAIL POPESCU ◽  
NIKHIL R. PAL ◽  
JOYCE A. MITCHELL ◽  
...  

The standard method for comparing gene products (proteins or RNA) is to compare their DNA or amino acid sequences. Additional information about some gene products may come from multiple sources, including the set of Gene Ontology (GO) annotations and the set of journal abstracts related to each gene product. Gene product similarity measures can be based on evaluating sets of descriptor terms found in the GO taxonomy, and/or the index term sets of the related documents (MeSH annotations). While our techniques can be applied to term sets from any taxonomy, we restrict our examples in this article to GO annotations. We investigate the use of linear order statistics (LOS) to build similarity relations on pairs of terms that are used in the GO as linguistic descriptors of genes and gene products. One of our objectives is to investigate the construction and utility of visual assessments of relational data (in this case, dissimilarity matrices) for discovering tendencies of groups of gene products to "cluster together". We use gene product data derived from a group of 194 gene products representing three protein families extracted from ENSEMBL. Our examples suggest that LOS similarity measures are more effective than traditional sequence-based similarity measures at capturing relationships between pairs of gene products in ENSEMBL families when annotation information is available. We show examples of how these similarity measures can assist in knowledge discovery and gene product family validation.


Sign in / Sign up

Export Citation Format

Share Document