scholarly journals Combining Semantic Similarity and GO Enrichment for Computation of Functional Similarity

2017 ◽  
Author(s):  
Wenting Liu ◽  
Jianjun Liu ◽  
Jagath C. Rajapakse

AbstractFunctional similarity between genes is widely used in many bioinformatics applications including detecting molecular pathways, finding co-expressed genes, predicting protein-protein interactions, and prioritization of candidate genes. Methods evaluating functional similarity of genes are mostly based on semantic similarity of gene ontology (GO) terms. Though there are hundreds of functional similarity measures available in the literature, none of them considers the enrichment of the GO terms by the querying gene pair. We propose a novel method to incorporate GO enrichment into the existing functional similarity measures. Our experiments show that the inclusion of gene enrichment significantly improves the performance of 44 widely used functional similarity measures, especially in the prediction of sequence homologies, gene expression correlations, and protein-protein interactions.Software availabilityThe software (python code) and all the benchmark datasets evaluation (R script) are available at https://gitlab.com/liuwt/EnrichFunSim.

BMC Genomics ◽  
2019 ◽  
Vol 20 (S9) ◽  
Author(s):  
Xiaoshi Zhong ◽  
Rama Kaalia ◽  
Jagath C. Rajapakse

Abstract Background Semantic similarity between Gene Ontology (GO) terms is a fundamental measure for many bioinformatics applications, such as determining functional similarity between genes or proteins. Most previous research exploited information content to estimate the semantic similarity between GO terms; recently some research exploited word embeddings to learn vector representations for GO terms from a large-scale corpus. In this paper, we proposed a novel method, named GO2Vec, that exploits graph embeddings to learn vector representations for GO terms from GO graph. GO2Vec combines the information from both GO graph and GO annotations, and its learned vectors can be applied to a variety of bioinformatics applications, such as calculating functional similarity between proteins and predicting protein-protein interactions. Results We conducted two kinds of experiments to evaluate the quality of GO2Vec: (1) functional similarity between proteins on the Collaborative Evaluation of GO-based Semantic Similarity Measures (CESSM) dataset and (2) prediction of protein-protein interactions on the Yeast and Human datasets from the STRING database. Experimental results demonstrate the effectiveness of GO2Vec over the information content-based measures and the word embedding-based measures. Conclusion Our experimental results demonstrate the effectiveness of using graph embeddings to learn vector representations from undirected GO and GOA graphs. Our results also demonstrate that GO annotations provide useful information for computing the similarity between GO terms and between proteins.


2014 ◽  
Vol 12 (06) ◽  
pp. 1442008 ◽  
Author(s):  
Jung-Hsien Chiang ◽  
Jiun-Huang Ju

Protein–protein interactions (PPIs) are involved in the majority of biological processes. Identification of PPIs is therefore one of the key aims of biological research. Although there are many databases of PPIs, many other unidentified PPIs could be buried in the biomedical literature. Therefore, automated identification of PPIs from biomedical literature repositories could be used to discover otherwise hidden interactions. Search engines, such as Google, have been successfully applied to measure the relatedness among words. Inspired by such approaches, we propose a novel method to identify PPIs through semantic similarity measures among protein mentions. We define six semantic similarity measures as features based on the page counts retrieved from the MEDLINE database. A machine learning classifier, Random Forest, is trained using the above features. The proposed approach achieve an averaged micro-F of 71.28% and an averaged macro-F of 64.03% over five PPI corpora, an improvement over the results of using only the conventional co-occurrence feature (averaged micro-F of 68.79% and an averaged macro-F of 60.49%). A relation-word reinforcement further improves the averaged micro-F to 71.3% and averaged macro-F to 65.12%. Comparing the results of the current work with other studies on the AIMed corpus (ranging from 77.58% to 85.1% in micro-F, 62.18% to 76.27% in macro-F), we show that the proposed approach achieves micro-F of 81.88% and macro-F of 64.01% without the use of sophisticated feature extraction. Finally, we manually examine the newly discovered PPI pairs based on a literature review, and the results suggest that our approach could extract novel protein–protein interactions.


2016 ◽  
Vol 5 (4) ◽  
pp. 93-98
Author(s):  
Wen Sun ◽  
Lin Han ◽  
Wenmao Xu ◽  
Yazhen Sun

AbstractObjective: The objective of this work is to search for a novel method to explore the disrupted pathways associated with periodontitis (PD) based on the network level.Methods: Firstly, the differential expression genes (DEGs) between PD patients and cognitively normal subjects were inferred based on LIMMA package. Then, the protein-protein interactions (PPI) in each pathway were explored by Empirical Bayesian (EB) co-expression program. Specifically, we determined the 100th weight value as the threshold value of the disrupted pathways of PPI by constructing the randomly model and confirmed the weight value of each pathway. Meanwhile, we dissected the disrupted pathways under the weight value > the threshold value. Pathways enrichment analyses of DEGs were carried out based on Expression Analysis Systematic Explored (EASE) test. Finally, the better method was selected based on the more rich and significant obtained pathways by comparing the two methods.Results: After the calculation of LIMMA package, we estimated 524 DEGs in all. Then we determined 0.115222 as the threshold value of the disrupted pathways of PPI. When the weight value>0.115222, there were 258 disrupted pathways of PPI enriched in. Additionally, we observed those 524 DEGs that were enriched in 4 pathways under EASE=0.1.Conclusion: We proposed a novel network method inferring the disrupted pathway for PD. The disrupted pathways might be underlying biomarkers for treatment associated with PD.


2017 ◽  
Vol 45 (12) ◽  
pp. 7094-7105 ◽  
Author(s):  
Milana Frenkel-Morgenstern ◽  
Alessandro Gorohovski ◽  
Somnath Tagore ◽  
Vaishnovi Sekar ◽  
Miguel Vazquez ◽  
...  

2020 ◽  
Vol 111 ◽  
pp. 103579
Author(s):  
Steven Cox ◽  
Xialan Dong ◽  
Ruhi Rai ◽  
Laura Christopherson ◽  
Weifan Zheng ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document