Finding most informative common ancestor in cross-ontological semantic similarity assessment: An intrinsic information content-based approach

Abstract Background Semantic similarity between Gene Ontology (GO) terms is a fundamental measure for many bioinformatics applications, such as determining functional similarity between genes or proteins. Most previous research exploited information content to estimate the semantic similarity between GO terms; recently some research exploited word embeddings to learn vector representations for GO terms from a large-scale corpus. In this paper, we proposed a novel method, named GO2Vec, that exploits graph embeddings to learn vector representations for GO terms from GO graph. GO2Vec combines the information from both GO graph and GO annotations, and its learned vectors can be applied to a variety of bioinformatics applications, such as calculating functional similarity between proteins and predicting protein-protein interactions. Results We conducted two kinds of experiments to evaluate the quality of GO2Vec: (1) functional similarity between proteins on the Collaborative Evaluation of GO-based Semantic Similarity Measures (CESSM) dataset and (2) prediction of protein-protein interactions on the Yeast and Human datasets from the STRING database. Experimental results demonstrate the effectiveness of GO2Vec over the information content-based measures and the word embedding-based measures. Conclusion Our experimental results demonstrate the effectiveness of using graph embeddings to learn vector representations from undirected GO and GOA graphs. Our results also demonstrate that GO annotations provide useful information for computing the similarity between GO terms and between proteins.

Download Full-text

Applying Semantic Similarity Measures Based on Information Content in the Evaluation of a Domain Ontology

2018 Seventeenth Mexican International Conference on Artificial Intelligence (MICAI) ◽

10.1109/micai46078.2018.00009 ◽

2018 ◽

Author(s):

Aimee Cecilia Hernandez Garcia ◽

Mireya Tovar Vidal ◽

Jose de Jesus Lavalle Martinez

Keyword(s):

Information Content ◽

Semantic Similarity ◽

Similarity Measures ◽

Domain Ontology

Download Full-text

A New Model of Information Content for Semantic Similarity in WordNet

2008 Second International Conference on Future Generation Communication and Networking Symposia ◽

10.1109/fgcns.2008.16 ◽

2008 ◽

Cited By ~ 60

Author(s):

Zili Zhou ◽

Yanna Wang ◽

Junzhong Gu

Keyword(s):

Information Content ◽

Semantic Similarity ◽

New Model

Download Full-text

Joint semantic similarity assessment with raw corpus and structured ontology for semantic-oriented service discovery

Personal and Ubiquitous Computing ◽

10.1007/s00779-016-0921-0 ◽

2016 ◽

Vol 20 (3) ◽

pp. 311-323 ◽

Cited By ~ 8

Author(s):

Wei Lu ◽

Yuanyuan Cai ◽

Xiaoping Che ◽

Yuxun Lu

Keyword(s):

Semantic Similarity ◽

Service Discovery ◽

Similarity Assessment

Download Full-text

MeSH-based disambiguation method using an intrinsic information content measure of semantic similarity

Procedia Computer Science ◽

10.1016/j.procs.2017.08.169 ◽

2017 ◽

Vol 112 ◽

pp. 564-573 ◽

Cited By ~ 1

Author(s):

Imen Gabsi ◽

Hager Kammoun ◽

Sarra brahmi ◽

Ikram Amous

Keyword(s):

Information Content ◽

Semantic Similarity

Download Full-text

A GRAPH-BASED SEMANTIC SIMILARITY MEASURE FOR THE GENE ONTOLOGY

Journal of Bioinformatics and Computational Biology ◽

10.1142/s0219720011005641 ◽

2011 ◽

Vol 09 (06) ◽

pp. 681-695 ◽

Cited By ~ 15

Author(s):

MARCO A. ALVAREZ ◽

CHANGHUI YAN

Keyword(s):

Gene Ontology ◽

Semantic Similarity ◽

Common Ancestor ◽

State Of The Art ◽

Sequence Similarity ◽

Similarity Score ◽

Gene Products ◽

Semantic Similarity Measure ◽

Similarity Algorithm ◽

Go Terms

Existing methods for calculating semantic similarities between pairs of Gene Ontology (GO) terms and gene products often rely on external databases like Gene Ontology Annotation (GOA) that annotate gene products using the GO terms. This dependency leads to some limitations in real applications. Here, we present a semantic similarity algorithm (SSA), that relies exclusively on the GO. When calculating the semantic similarity between a pair of input GO terms, SSA takes into account the shortest path between them, the depth of their nearest common ancestor, and a novel similarity score calculated between the definitions of the involved GO terms. In our work, we use SSA to calculate semantic similarities between pairs of proteins by combining pairwise semantic similarities between the GO terms that annotate the involved proteins. The reliability of SSA was evaluated by comparing the resulting semantic similarities between proteins with the functional similarities between proteins derived from expert annotations or sequence similarity. Comparisons with existing state-of-the-art methods showed that SSA is highly competitive with the other methods. SSA provides a reliable measure for semantics similarity independent of external databases of functional-annotation observations.

Download Full-text