A GRAPH-BASED SEMANTIC SIMILARITY MEASURE FOR THE GENE ONTOLOGY

Existing methods for calculating semantic similarities between pairs of Gene Ontology (GO) terms and gene products often rely on external databases like Gene Ontology Annotation (GOA) that annotate gene products using the GO terms. This dependency leads to some limitations in real applications. Here, we present a semantic similarity algorithm (SSA), that relies exclusively on the GO. When calculating the semantic similarity between a pair of input GO terms, SSA takes into account the shortest path between them, the depth of their nearest common ancestor, and a novel similarity score calculated between the definitions of the involved GO terms. In our work, we use SSA to calculate semantic similarities between pairs of proteins by combining pairwise semantic similarities between the GO terms that annotate the involved proteins. The reliability of SSA was evaluated by comparing the resulting semantic similarities between proteins with the functional similarities between proteins derived from expert annotations or sequence similarity. Comparisons with existing state-of-the-art methods showed that SSA is highly competitive with the other methods. SSA provides a reliable measure for semantics similarity independent of external databases of functional-annotation observations.

Download Full-text

An integrated information-based similarity measurement of gene ontology terms

Computer Science and Information Systems ◽

10.2298/csis141130053z ◽

2015 ◽

Vol 12 (4) ◽

pp. 1235-1253 ◽

Cited By ~ 1

Author(s):

Shu-Bo Zhang ◽

Jian-Huang Lai

Keyword(s):

Gene Ontology ◽

Semantic Similarity ◽

Semantic Information ◽

Gene Expression Dataset ◽

Similarity Measurement ◽

Depth Information ◽

Go Terms ◽

Validation Experiments ◽

Integrated Information ◽

Common Ancestors

Measuring the semantic similarity between pairs of terms in Gene Ontology (GO) can help to compare genes that can not be compared by other computational methods. In this study, we proposed an integrated information-based similarity measurement (IISM) to calculate the semantic similarity between two GO terms by taking into account multiple common ancestors that they share, and aggregating the semantic information and depth information of the non-redundant common ancestors. Our method searches for non-redundant common ancestors in an effective way. Validation experiments were conducted on both gene expression dataset and pathway dataset, and the experimental results suggest the superiority of our method against some existing methods.

Download Full-text

Word and sentence embedding tools to measure semantic similarity of Gene Ontology terms by their definitions

10.1101/103648 ◽

2017 ◽

Cited By ~ 1

Author(s):

Dat Duong ◽

Wasi Uddin Ahmad ◽

Eleazar Eskin ◽

Kai-Wei Chang ◽

Jingyi Jessica Li

Keyword(s):

Neural Network ◽

Gene Ontology ◽

Language Processing ◽

Classification Accuracy ◽

Dimensional Space ◽

Similarity Score ◽

Biological Functions ◽

Word Similarity ◽

True Protein ◽

Go Terms

AbstractThe Gene Ontology (GO) database contains GO terms that describe biological functions of genes. Previous methods for comparing GO terms have relied on the fact that GO terms are organized into a tree structure. In this paradigm, the locations of two GO terms in the tree dictate their similarity score. In this paper, we introduce two new solutions for this problem, by focusing instead on the definitions of the GO terms. We apply neural network based techniques from the natural language processing (NLP) domain. The first method does not rely on the GO tree, whereas the second indirectly depends on the GO tree. In our first approach, we compare two GO definitions by treating them as two unordered sets of words. The word similarity is estimated by a word embedding model that maps words into an N-dimensional space. In our second approach, we account for the word-ordering within a sentence. We use a sentence encoder to embed GO definitions into vectors and estimate how likely one definition entails another. We validate our methods in two ways. In the first experiment, we test the model’s ability to differentiate a true protein-protein network from a randomly generated network. In the second experiment, we test the model in identifying orthologs from randomly-matched genes in human, mouse, and fly. In both experiments, a hybrid of NLP and GO-tree based method achieves the best classification accuracy.Availabilitygithub.com/datduong/NLPMethods2CompareGOterms

Download Full-text

Unifying Themes in Microbial Associations with Animal and Plant Hosts Described Using the Gene Ontology

Microbiology and Molecular Biology Reviews ◽

10.1128/mmbr.00017-10 ◽

2010 ◽

Vol 74 (4) ◽

pp. 479-503 ◽

Cited By ~ 37

Author(s):

Trudy Torto-Alalibo ◽

Candace W. Collmer ◽

Michelle Gwinn-Giglio ◽

Magdalen Lindeberg ◽

Shaowu Meng ◽

...

Keyword(s):

Gene Ontology ◽

Effector Proteins ◽

Biological Processes ◽

Gene Products ◽

Host Defenses ◽

Host Interactions ◽

Microbial Associations ◽

Bioinformatic Approaches ◽

Go Terms ◽

Diverse Plant

SUMMARY Microbes form intimate relationships with hosts (symbioses) that range from mutualism to parasitism. Common microbial mechanisms involved in a successful host association include adhesion, entry of the microbe or its effector proteins into the host cell, mitigation of host defenses, and nutrient acquisition. Genes associated with these microbial mechanisms are known for a broad range of symbioses, revealing both divergent and convergent strategies. Effective comparisons among these symbioses, however, are hampered by inconsistent descriptive terms in the literature for functionally similar genes. Bioinformatic approaches that use homology-based tools are limited to identifying functionally similar genes based on similarities in their sequences. An effective solution to these limitations is provided by the Gene Ontology (GO), which provides a standardized language to describe gene products from all organisms. The GO comprises three ontologies that enable one to describe the molecular function(s) of gene products, the biological processes to which they contribute, and their cellular locations. Beginning in 2004, the Plant-Associated Microbe Gene Ontology (PAMGO) interest group collaborated with the GO consortium to extend the GO to accommodate terms for describing gene products associated with microbe-host interactions. Currently, over 900 terms that describe biological processes common to diverse plant- and animal-associated microbes are incorporated into the GO database. Here we review some unifying themes common to diverse host-microbe associations and illustrate how the new GO terms facilitate a standardized description of the gene products involved. We also highlight areas where new terms need to be developed, an ongoing process that should involve the whole community.

Download Full-text

Investigating Correlation between Protein Sequence Similarity and Semantic Similarity Using Gene Ontology Annotations

IEEE/ACM Transactions on Computational Biology and Bioinformatics ◽

10.1109/tcbb.2017.2695542 ◽

2018 ◽

Vol 15 (3) ◽

pp. 905-912 ◽

Cited By ~ 1

Author(s):

Najmul Ikram ◽

Muhammad Abdul Qadir ◽

Muhammad Tanvir Afzal

Keyword(s):

Gene Ontology ◽

Semantic Similarity ◽

Protein Sequence ◽

Sequence Similarity ◽

Protein Sequence Similarity

Download Full-text

A GO-driven semantic similarity measure for quantifying the biological relatedness of gene products

Intelligent Decision Technologies ◽

10.3233/idt-2009-0059 ◽

2009 ◽

Vol 3 (4) ◽

pp. 239-248 ◽

Cited By ~ 1

Author(s):

Spiridon C. Denaxas ◽

Christos Tjortjis

Keyword(s):

Semantic Similarity ◽

Similarity Measure ◽

Gene Products ◽

Semantic Similarity Measure

Download Full-text

An Integrated Platform Supporting Semantic Similarity Score Calculation and Reproducibility

10.21203/rs.3.rs-806346/v1 ◽

2021 ◽

Author(s):

Gaston K. Mazandu ◽

Kenneth Opap ◽

Funmilayo Makinde ◽

Victoria Nembaware ◽

Francis Agamah ◽

...

Keyword(s):

Gene Ontology ◽

Knowledge Sharing ◽

Semantic Similarity ◽

Automated Reasoning ◽

Large Scale ◽

Essential Role ◽

Similarity Score ◽

File Format ◽

Flexible Tool ◽

Integrated Platform

Abstract During the last decade, we witnessed an exponential rise of datasets from heterogeneous sources. Ontologies are playing an essential role in consistently describing domain concepts, data harmonization and integration to support large-scale integrative analysis and semantic interoperability in knowledge sharing. Several semantic similarity (SS) measures have been suggested to enable the integration of rich ontology structures into automated reasoning and inference. However, there is no tool that exhaustively implements these measures and existing tools are generally Gene Ontology specific, do not implement several models suggested in the WordNet context and are not equipped to properly deal with frequent ontology updates. We introduce a Python SS measure library (PySML), which tackles issues related to current SS tools, providing a portable and expandable tool to a broad computational audience. This empowers users to manipulate SS scores from several applications for any ontology version and file format. PySML is a flexible tool enabling the implementation of all existing semantic similarity models, resolving issues related to computation, reproducibility and re-usability of SS scores.

Download Full-text

Improving the state-of-the-art in Thai semantic similarity using distributional semantics and ontological information

PLoS ONE ◽

10.1371/journal.pone.0246751 ◽

2021 ◽

Vol 16 (2) ◽

pp. e0246751

Author(s):

Ponrudee Netisopakul ◽

Gerhard Wohlgenannt ◽

Aleksei Pulich ◽

Zar Zar Hlaing

Keyword(s):

Semantic Similarity ◽

Language Processing ◽

English Language ◽

State Of The Art ◽

Word Sense Disambiguation ◽

Similarity Score ◽

The State ◽

Word Sense ◽

Word Level ◽

High Fraction

Research into semantic similarity has a long history in lexical semantics, and it has applications in many natural language processing (NLP) tasks like word sense disambiguation or machine translation. The task of calculating semantic similarity is usually presented in the form of datasets which contain word pairs and a human-assigned similarity score. Algorithms are then evaluated by their ability to approximate the gold standard similarity scores. Many such datasets, with different characteristics, have been created for English language. Recently, four of those were transformed to Thai language versions, namely WordSim-353, SimLex-999, SemEval-2017-500, and R&G-65. Given those four datasets, in this work we aim to improve the previous baseline evaluations for Thai semantic similarity and solve challenges of unsegmented Asian languages (particularly the high fraction of out-of-vocabulary (OOV) dataset terms). To this end we apply and integrate different strategies to compute similarity, including traditional word-level embeddings, subword-unit embeddings, and ontological or hybrid sources like WordNet and ConceptNet. With our best model, which combines self-trained fastText subword embeddings with ConceptNet Numberbatch, we managed to raise the state-of-the-art, measured with the harmonic mean of Pearson on Spearman ρ, by a large margin from 0.356 to 0.688 for TH-WordSim-353, from 0.286 to 0.769 for TH-SemEval-500, from 0.397 to 0.717 for TH-SimLex-999, and from 0.505 to 0.901 for TWS-65.

Download Full-text

Toward semantic similarity measure between concepts in an ontology

Indonesian Journal of Electrical Engineering and Computer Science ◽

10.11591/ijeecs.v14.i3.pp1356-1372 ◽

2019 ◽

Vol 14 (3) ◽

pp. 1356

Author(s):

Suwan Tongphu

Keyword(s):

Semantic Similarity ◽

Similarity Measure ◽

Description Logic ◽

Classical Problem ◽

Similarity Score ◽

New Method ◽

Snomed Ct ◽

Major Drawback ◽

Semantic Similarity Measure ◽

Concept Definition

<p>A similarity measure is one classical problem in Description Logic which aims at identifying the similarity between concepts in an ontology. Finding a hierarchy distance among concepts in an ontology is one popular technique. However, one major drawback of such a technique is that it usually ignores a concept definition analysis. This work introduces a new method for similarity measure. The proposed system semantically analyzes structures of two concept descriptions and then computes the similarity score based on the number of shared features. The efficiency of the proposed algorithm is measured by means of the satisfaction of desirable properties and intensive experiments on the Snomed ct ontology.</p>

Download Full-text

A Semantic Similarity Algorithm Based on the Nearest Common Ancestor Node

DEStech Transactions on Computer Science and Engineering ◽

10.12783/dtcse/wcne2017/19882 ◽

2018 ◽

Author(s):

Zhe ZHANG ◽

Yun-xiao ZU ◽

Bin HOU

Keyword(s):

Semantic Similarity ◽

Common Ancestor ◽

Similarity Algorithm

Download Full-text

A New Semantic Similarity Measure Based On Ontology for Movie Rate Prediction

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.c4442.098319 ◽

2019 ◽

Vol 8 (3) ◽

pp. 6756-6762

Keyword(s):

Semantic Similarity ◽

Similarity Measure ◽

Experimental Evaluation ◽

Pearson Correlation ◽

Similarity Measures ◽

Similarity Score ◽

Cosine Similarity ◽

Semantic Similarity Measure ◽

Rate Prediction ◽

Target User

A recommendation algorithm comprises of two important steps: 1) Predicting rates, and 2) Recommendation. Rate prediction is a cumulative function of the similarity score between two movies and rate history of those movies by other users. There are various methods for rate prediction such as weighted sum method, regression, deviation based etc. All these methods rely on finding similar items to the items previously viewed/rated by target user, with assumption that user tends to have similar rating for similar items. Computing the similarities can be done using various similarity measures such as Euclidian Distance, Cosine Similarity, Adjusted Cosine Similarity, Pearson Correlation, Jaccard Similarity etc. All of these well-known approaches calculate similarity score between two movies using simple rating based data. Hence, such similarity measures could not accurately model rating behavior of user. In this paper, we will show that the accuracy in rate prediction can be enhanced by incorporating ontological domain knowledge in similarity computation. This paper introduces a new ontological semantic similarity measure between two movies. For experimental evaluation, the performance of proposed approach is compared with two existing approaches: 1) Adjusted Cosine Similarity (ACS), and 2) Weighted Slope One (WSO) algorithm, in terms of two performance measures: 1) Execution time and 2) Mean Absolute Error (MAE). The open-source Movielens (ml-1m) dataset is used for experimental evaluation. As our results show, the ontological semantic similarity measure enhances the performance of rate prediction as compared to the existing-well known approaches.

Download Full-text