Semantic similarity measure in biomedical domain leverage Web Search Engine

Abstract Background Ontology-based semantic similarity measures based on SNOMED-CT, MeSH, and Gene Ontology are being extensively used in many applications in biomedical text mining and genomics respectively, which has encouraged the development of semantic measures libraries based on the aforementioned ontologies. However, current state-of-the-art semantic measures libraries have some performance and scalability drawbacks derived from their ontology representations based on relational databases, or naive in-memory graph representations. Likewise, a recent reproducible survey on word similarity shows that one hybrid IC-based measure which integrates a shortest-path computation sets the state of the art in the family of ontology-based semantic measures. However, the lack of an efficient shortest-path algorithm for their real-time computation prevents both their practical use in any application and the use of any other path-based semantic similarity measure. Results To bridge the two aforementioned gaps, this work introduces for the first time an updated version of the HESML Java software library especially designed for the biomedical domain, which implements the most efficient and scalable ontology representation reported in the literature, together with a new method for the approximation of the Dijkstra’s algorithm for taxonomies, called Ancestors-based Shortest-Path Length (AncSPL), which allows the real-time computation of any path-based semantic similarity measure. Conclusions We introduce a set of reproducible benchmarks showing that HESML outperforms by several orders of magnitude the current state-of-the-art libraries in the three aforementioned biomedical ontologies, as well as the real-time performance and approximation quality of the new AncSPL shortest-path algorithm. Likewise, we show that AncSPL linearly scales regarding the dimension of the common ancestor subgraph regardless of the ontology size. Path-based measures based on the new AncSPL algorithm are up to six orders of magnitude faster than their exact implementation in large ontologies like SNOMED-CT and GO. Finally, we provide a detailed reproducibility protocol and dataset as supplementary material to allow the exact replication of all our experiments and results.

Download Full-text

Measuring semantic similarity using web search engine

International Conference on Advanced Nanomaterials & Emerging Engineering Technologies ◽

10.1109/icanmeet.2013.6609373 ◽

2013 ◽

Author(s):

Shanmugapriya ◽

K. Latha

Keyword(s):

Semantic Similarity ◽

Search Engine ◽

Web Search ◽

Web Search Engine

Download Full-text

A Web Search Engine-Based Approach to Measure Semantic Similarity between Words

IJARCCE ◽

10.17148/ijarcce.2014.31026 ◽

2014 ◽

pp. 8195-8199

Author(s):

NARENDRA PRADHAN ◽

KAMLESH KUMAR PANDEY ◽

RAJESH SAHU

Keyword(s):

Semantic Similarity ◽

Search Engine ◽

Web Search ◽

Web Search Engine

Download Full-text

A Web Search Engine-Based Approach to Measure Semantic Similarity between Words

IEEE Transactions on Knowledge and Data Engineering ◽

10.1109/tkde.2010.172 ◽

2011 ◽

Vol 23 (7) ◽

pp. 977-990 ◽

Cited By ~ 118

Author(s):

Danushka Bollegala ◽

Yutaka Matsuo ◽

Mitsuru Ishizuka

Keyword(s):

Semantic Similarity ◽

Search Engine ◽

Web Search ◽

Web Search Engine

Download Full-text

Hierarchical Matching of Traffic Information Services Using Semantic Similarity

Journal of Advanced Transportation ◽

10.1155/2018/2041503 ◽

2018 ◽

Vol 2018 ◽

pp. 1-12

Author(s):

Zongtao Duan ◽

Lei Tang ◽

Zhiliang Kou ◽

Yishui Zhu

Keyword(s):

Semantic Similarity ◽

Similarity Measure ◽

Web Search ◽

Transportation Network ◽

Traffic Information ◽

Semantic Similarity Measure ◽

Service Matching ◽

Taxonomic Distance ◽

Service Clustering ◽

Two Stages

Service matching aims to find the information similar to a given query, which has numerous applications in web search. Although existing methods yield promising results, they are not applicable for transportation. In this paper, we propose a multilevel matching method based on semantic technology, towards efficiently searching the traffic information requested. Our approach is divided into two stages: service clustering, which prunes candidate services that are not promising, and functional matching. The similarity at function level between services is computed by grouping the connections between the services into inheritance and noninheritance relationships. We also developed a three-layer framework with a semantic similarity measure that requires less time and space cost than existing method since the scale of candidate services is significantly smaller than the whole transportation network. The OWL_TC4 based service set was used to verify the proposed approach. The accuracy of offline service clustering reached 93.80%, and it reduced the response time to 651 ms when the total number of candidate services was 1000. Moreover, given the different thresholds for the semantic similarity measure, the proposed mixed matching model did better in terms of recall and precision (i.e., up to 72.7% and 80%, respectively, for more than 1000 services) compared to the compared models based on information theory and taxonomic distance. These experimental results confirmed the effectiveness and validity of service matching for responding quickly and accurately to user queries.

Download Full-text