scholarly journals Comparable Evaluation of Contemporary Corpus-Based and Knowledge-Based Semantic Similarity Measures of Short Texts

Author(s):  
Bojan Furlan ◽  
Vladimir Sivački ◽  
Davor Jovanović ◽  
Boško Nikolić

This paper presents methods for measuring the semantic similarity of texts, where we evaluated different approaches based on existing similarity measures. On one side word similarity was calculated by processing large text corpuses and on the other, commonsense knowledgebase was used. Given that a large fraction of the information available today, on the Web and elsewhere, consists of short text snippets (e.g. abstracts of scientific documents, image captions or product descriptions), where commonsense knowledge has an important role, in this paper we focus on computing the similarity between two sentences or two short paragraphs by extending existing measures with information from the ConceptNet knowledgebase. On the other hand, an extensive research has been done in the field of corpus-based semantic similarity, so we also evaluated existing solutions by imposing some modifications. Through experiments performed on a paraphrase data set, we demonstrate that some of proposed approaches can improve the semantic similarity measurement of short text.

Author(s):  
Jorge Martinez-Gil

Semantic similarity measurement of biomedical nomenclature aims to determine the likeness between two biomedical expressions that use different lexicographies for representing the same real biomedical concept. There are many semantic similarity measures for trying to address this issue, many of them have represented an incremental improvement over the previous ones. In this work, we present yet another incremental solution that is able to outperform existing approaches by using a sophisticated aggregation method based on fuzzy logic. Results show us that our strategy is able to consistently beat existing approaches when solving well-known biomedical benchmark data sets.


2017 ◽  
Author(s):  
Jorge Martinez-Gil

Computing the semantic similarity between terms (or short text expressions) that have the same meaning but which are not lexicographically similar is a key challenge in many computer related fields. The problem is that traditional approaches to semantic similarity measurement are not suitable for all situations, for example, many of them often fail to deal with terms not covered by synonym dictionaries or are not able to cope with acronyms, abbreviations, buzzwords, brand names, proper nouns, and so on. In this paper, we present and evaluate a collection of emerging techniques developed to avoid this problem. These techniques use some kinds of web intelligence to determine the degree of similarity between text expressions. These techniques implement a variety of paradigms including the study of co-occurrence, text snippet comparison, frequent pattern finding, or search log analysis. The goal is to substitute the traditional techniques where necessary.


2017 ◽  
Author(s):  
Jorge Martinez-Gil

Semantic similarity measurement aims to determine the likeness between two text expressions that use different lexicographies for representing the same real object or idea. There are a lot of semantic similarity measures for addressing this problem. However, the best results have been achieved when aggregating a number of simple similarity measures. This means that after the various similarity values have been calculated, the overall similarity for a pair of text expressions is computed using an aggregation function of these individual semantic similarity values. This aggregation is often computed by means of statistical functions. In this work, we present CoTO (Consensus or Trade-Off) a solution based on fuzzy logic that is able to outperform these traditional approaches.


2017 ◽  
Author(s):  
Jorge Martinez-Gil ◽  
José F. Aldana Montes

Computing the semantic similarity between terms (or short text expressions) that have the same meaning but which are not lexicographically similar is an important challenge in the information integration field. The problem is that techniques for textual semantic similarity measurement often fail to deal with words not covered by synonym dictionaries. In this paper, we try to solve this problem by determining the semantic similarity for terms using the knowledge inherent in the search history logs from the Google search engine. To do this, we have designed and evaluated four algorithmic methods for measuring the semantic similarity between terms using their associated history search patterns. These algorithmic methods are: a) frequent co-occurrence of terms in search patterns, b) computation of the relationship between search patterns, c) outlier coincidence on search patterns, and d) forecasting comparisons. We have shown experimentally that some of these methods correlate well with respect to human judgment when evaluating general purpose benchmark datasets, and significantly outperform existing methods when evaluating datasets containing terms that do not usually appear in dictionaries.


2021 ◽  
Vol 54 (2) ◽  
pp. 1-37
Author(s):  
Dhivya Chandrasekaran ◽  
Vijay Mago

Estimating the semantic similarity between text data is one of the challenging and open research problems in the field of Natural Language Processing (NLP). The versatility of natural language makes it difficult to define rule-based methods for determining semantic similarity measures. To address this issue, various semantic similarity methods have been proposed over the years. This survey article traces the evolution of such methods beginning from traditional NLP techniques such as kernel-based methods to the most recent research work on transformer-based models, categorizing them based on their underlying principles as knowledge-based, corpus-based, deep neural network–based methods, and hybrid methods. Discussing the strengths and weaknesses of each method, this survey provides a comprehensive view of existing systems in place for new researchers to experiment and develop innovative ideas to address the issue of semantic similarity.


2018 ◽  
Vol 14 (2) ◽  
pp. 16-36 ◽  
Author(s):  
Carlos Ramón Rangel ◽  
Junior Altamiranda ◽  
Mariela Cerrada ◽  
Jose Aguilar

The merging procedures of two ontologies are mostly related to the enrichment of one of the input ontologies, i.e. the knowledge of the aligned concepts from one ontology are copied into the other ontology. As a consequence, the resulting new ontology extends the original knowledge of the base ontology, but the unaligned concepts of the other ontology are not considered in the new extended ontology. On the other hand, there are experts-aided semi-automatic approaches to accomplish the task of including the knowledge that is left out from the resulting merged ontology and debugging the possible concept redundancy. With the aim of facing the posed necessity of including all the knowledge of the ontologies to be merged without redundancy, this article proposes an automatic approach for merging ontologies, which is based on semantic similarity measures and exhaustive searching along of the closest concepts. The authors' approach was compared to other merging algorithms, and good results are obtained in terms of completeness, relationships and properties, without creating redundancy.


Sign in / Sign up

Export Citation Format

Share Document