A Modification of the Leacock-Chodorow Measure of the Semantic Relatedness of Concepts

The measures of the semantic relatedness of concepts can be categorised into two types: knowledge‑based methods and corpus‑based methods. Knowledge‑based techniques make use of man‑created dictionaries, thesauruses and other artefacts as a source of knowledge. Corpus‑based techniques assess the semantic similarity of two concepts making use of large corpora of text documents. Some researchers claim that knowledge‑based measures outperform corpus‑based ones, but it is much more important to observe that the latter ones are heavily corpus dependent. In this article, we propose to modify the best WordNet‑based method of assessing semantic relatedness, i.e. the Leacock‑Chodorow measure. This measure has proven to be the best in several studies and has a very simple formula. We asses our proposal on the basis of two popular benchmark sets of pairs of concepts, i.e. the Ruben‑Goodenough set of 65 pairs of concepts and the Fickelstein set of 353 pairs of terms. The results prove that our proposal outperforms the traditional Leacock‑Chodorow measure.

Download Full-text

A knowledge-based approach for retrieving scenario-specific medical text documents

Control Engineering Practice ◽

10.1016/j.conengprac.2004.12.011 ◽

2005 ◽

Vol 13 (9) ◽

pp. 1105-1121 ◽

Cited By ~ 7

Author(s):

Wesley W. Chu ◽

Zhenyu Liu ◽

Wenlei Mao ◽

Qinghua Zou

Keyword(s):

Text Documents ◽

Medical Text ◽

Knowledge Based

Download Full-text

Evolution of Semantic Similarity—A Survey

ACM Computing Surveys ◽

10.1145/3440755 ◽

2021 ◽

Vol 54 (2) ◽

pp. 1-37

Author(s):

Dhivya Chandrasekaran ◽

Vijay Mago

Keyword(s):

Natural Language ◽

Semantic Similarity ◽

Language Processing ◽

Hybrid Methods ◽

Research Work ◽

Similarity Measures ◽

Text Data ◽

Knowledge Based ◽

Open Research ◽

Research Problems

Estimating the semantic similarity between text data is one of the challenging and open research problems in the field of Natural Language Processing (NLP). The versatility of natural language makes it difficult to define rule-based methods for determining semantic similarity measures. To address this issue, various semantic similarity methods have been proposed over the years. This survey article traces the evolution of such methods beginning from traditional NLP techniques such as kernel-based methods to the most recent research work on transformer-based models, categorizing them based on their underlying principles as knowledge-based, corpus-based, deep neural network–based methods, and hybrid methods. Discussing the strengths and weaknesses of each method, this survey provides a comprehensive view of existing systems in place for new researchers to experiment and develop innovative ideas to address the issue of semantic similarity.

Download Full-text

Adapting Gloss Vector Semantic Relatedness Measure for Semantic Similarity Estimation: An Evaluation in the Biomedical Domain

Semantic Technology - Lecture Notes in Computer Science ◽

10.1007/978-3-319-14122-0_11 ◽

2014 ◽

pp. 129-145 ◽

Cited By ~ 4

Author(s):

Ahmad Pesaranghader ◽

Azadeh Rezaei ◽

Ali Pesaranghader

Keyword(s):

Semantic Similarity ◽

Semantic Relatedness ◽

Biomedical Domain ◽

Similarity Estimation

Download Full-text

LIS4: Lesk Inspired Sense Specific Semantic Similarity using WordNet

Journal of Information & Knowledge Management ◽

10.1142/s0219649221500064 ◽

2021 ◽

pp. 2150006

Author(s):

Saravanakumar Kandasamy ◽

Aswani Kumar Cherukuri

Keyword(s):

Information Retrieval ◽

Natural Language Processing ◽

Natural Language ◽

Semantic Similarity ◽

Language Processing ◽

Gold Standard ◽

Question Answering ◽

Knowledge Based ◽

Benchmark Datasets ◽

Processing Information

Semantic similarity quantification between concepts is one of the inevitable parts in domains like Natural Language Processing, Information Retrieval, Question Answering, etc. to understand the text and their relationships better. Last few decades, many measures have been proposed by incorporating various corpus-based and knowledge-based resources. WordNet and Wikipedia are two of the Knowledge-based resources. The contribution of WordNet in the above said domain is enormous due to its richness in defining a word and all of its relationship with others. In this paper, we proposed an approach to quantify the similarity between concepts that exploits the synsets and the gloss definitions of different concepts using WordNet. Our method considers the gloss definitions, contextual words that are helping in defining a word, synsets of contextual word and the confidence of occurrence of a word in other word’s definition for calculating the similarity. The evaluation based on different gold standard benchmark datasets shows the efficiency of our system in comparison with other existing taxonomical and definitional measures.

Download Full-text

Semantic Similarity/Relatedness for Cross Language Plagiarism Detection

Indonesian Journal of Electrical Engineering and Computer Science ◽

10.11591/ijeecs.v1.i2.pp371-374 ◽

2016 ◽

Vol 1 (2) ◽

pp. 371 ◽

Cited By ~ 2

Author(s):

Hanane Ezzikouri ◽

Mohammed Erritali ◽

Mohamed Oukessou

Keyword(s):

Semantic Similarity ◽

Semantic Relatedness ◽

Application Programming Interface ◽

Plagiarism Detection ◽

French And English ◽

Unique Interpretation ◽

Application Programming ◽

Cross Language ◽

Similarity Distance ◽

Programming Interface

<p>Generally utterances in natural language are highly ambiguous, and a unique interpretation can usually be determined only by taking into account the context in the utterance occurred. Automatically determining the correct sense of a polysemous word is a complicated problem especially in multilingual corpuses. This paper presents an application programming interface for several Semantic Relatedness/Similarity metrics measuring semantic similarity/distance between multilingual words and concepts, in order to use it after for sentences and paragraphs in Cross Language Plagiarism Detection (CLPD); using WordNet for the English-French and English-Arabic multilingual plagiarism cases.</p>

Download Full-text

Emotion Mining Using Semantic Similarity

International Journal of Synthetic Emotions ◽

10.4018/ijse.2018070101 ◽

2018 ◽

Vol 9 (2) ◽

pp. 1-22 ◽

Cited By ~ 2

Author(s):

Rafiya Jan ◽

Afaq Alam Khan

Keyword(s):

Semantic Similarity ◽

Semantic Relatedness ◽

Semantic Features ◽

Emotion Detection ◽

Emotion Classification ◽

Textual Data ◽

Different Types ◽

Distributional Semantic Models ◽

Affective Information

Social networks are considered as the most abundant sources of affective information for sentiment and emotion classification. Emotion classification is the challenging task of classifying emotions into different types. Emotions being universal, the automatic exploration of emotion is considered as a difficult task to perform. A lot of the research is being conducted in the field of automatic emotion detection in textual data streams. However, very little attention is paid towards capturing semantic features of the text. In this article, the authors present the technique of semantic relatedness for automatic classification of emotion in the text using distributional semantic models. This approach uses semantic similarity for measuring the coherence between the two emotionally related entities. Before classification, data is pre-processed to remove the irrelevant fields and inconsistencies and to improve the performance. The proposed approach achieved the accuracy of 71.795%, which is competitive considering as no training or annotation of data is done.

Download Full-text

Combining Diverse Knowledge Based Features for Semantic Relatedness Measures

Quantitative Semantics and Soft Computing Methods for the Web ◽

10.4018/978-1-60960-881-1.ch005 ◽

2011 ◽

pp. 96-117

Author(s):

Anna Lisa Gentile ◽

Ziqi Zhang ◽

Fabio Ciravegna

Keyword(s):

Natural Language Processing ◽

Language Processing ◽

Semantic Relatedness ◽

Feature Sets ◽

Feature Weights ◽

Named Entity ◽

Knowledge Resource ◽

Knowledge Based ◽

Entity Disambiguation ◽

Named Entity Disambiguation

This chapter proposes a novel Semantic Relatedness (SR) measure that exploits diverse features extracted from a knowledge resource. Computing SR is a crucial technique for many complex Natural Language Processing (NLP) as well as Semantic Web related tasks. Typically, semantic relatedness measures only make use of limited number of features without considering diverse feature sets or understanding the different contributions of features to the accuracy of a method. This chapter proposes a random graph walk model based method that naturally combines diverse features extracted from a knowledge resource in a balanced way in the task of computing semantic relatedness. A set of experiments is carefully designed to investigate the effects of choosing different features and altering their weights on the accuracy of the system. Next, using the derived feature sets and feature weights we evaluate the proposed method against the state-of-the-art semantic relatedness measures, and show that it obtains higher accuracy on many benchmarking datasets. Additionally, the authors justify the usefulness of the proposed method in a practical NLP task, i.e. Named Entity Disambiguation.

Download Full-text

Adapting Gloss Vector Semantic Relatedness Measure for Semantic Similarity Estimation: An Evaluation in the Biomedical Domain

Semantic Technology - Lecture Notes in Computer Science ◽

10.1007/978-3-319-06826-8_11 ◽

2014 ◽

pp. 129-145 ◽

Cited By ~ 3

Author(s):

Ahmad Pesaranghader ◽

Azadeh Rezaei ◽

Ali Pesaranghader

Keyword(s):

Semantic Similarity ◽

Semantic Relatedness ◽

Biomedical Domain ◽

Similarity Estimation

Download Full-text

Heuristic Inventive Design Problem Solving Based on Semantic Relatedness

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.875-877.968 ◽

2014 ◽

Vol 875-877 ◽

pp. 968-972

Author(s):

Wei Yan ◽

Cecilia Zanni-Merk ◽

François Rousselot ◽

Denis Cavallucci ◽

Pierre Collet

Keyword(s):

Problem Solving ◽

Semantic Similarity ◽

Design Problem ◽

Semantic Relatedness ◽

Level Of Detail ◽

Knowledge Sources ◽

Design Problems ◽

Short Text ◽

Design Expert

A growing number of industries feel the need of formalizing their innovation approaches. Modern innovation theories and methods use different knowledge sources for solving inventive design problems. These sources are generally about similar notions, but the level of detail of their description can be very different. We are interested in finding semantic links among these sources and developing an intelligent way of managing this knowledge, with the goal of assisting the inventive design expert during his activities. This paper explores a short text semantic similarity approach to search potential links among these sources. These links available could facilitate the retrieval for the heuristic solutions of inventive problems for TRIZ users.

Download Full-text

Arabic Gloss WSD Using BERT

Applied Sciences ◽

10.3390/app11062567 ◽

2021 ◽

Vol 11 (6) ◽

pp. 2567

Author(s):

Mohammed El-Razzaz ◽

Mohamed Waleed Fakhr ◽

Fahima A. Maghraby

Keyword(s):

Target Word ◽

Semantic Similarity ◽

Test Data ◽

Word Sense Disambiguation ◽

Word Sense ◽

Written Text ◽

Knowledge Based ◽

Training Samples ◽

Sense Disambiguation ◽

Definition Of

Word Sense Disambiguation (WSD) aims to predict the correct sense of a word given its context. This problem is of extreme importance in Arabic, as written words can be highly ambiguous; 43% of diacritized words have multiple interpretations and the percentage increases to 72% for non-diacritized words. Nevertheless, most Arabic written text does not have diacritical marks. Gloss-based WSD methods measure the semantic similarity or the overlap between the context of a target word that needs to be disambiguated and the dictionary definition of that word (gloss of the word). Arabic gloss WSD suffers from a lack of context-gloss datasets. In this paper, we present an Arabic gloss-based WSD technique. We utilize the celebrated Bidirectional Encoder Representation from Transformers (BERT) to build two models that can efficiently perform Arabic WSD. These models can be trained with few training samples since they utilize BERT models that were pretrained on a large Arabic corpus. Our experimental results show that our models outperform two of the most recent gloss-based WSDs when we test them against the same test data used to evaluate our model. Additionally, our model achieves an F1-score of 89% compared to the best-reported F1-score of 85% for knowledge-based Arabic WSD. Another contribution of this paper is introducing a context-gloss benchmark that may help to overcome the lack of a standardized benchmark for Arabic gloss-based WSD.

Download Full-text