scholarly journals A Modification of the Leacock-Chodorow Measure of the Semantic Relatedness of Concepts

2020 ◽  
Vol 6 (351) ◽  
pp. 97-106
Author(s):  
Jerzy Korzeniewski

The measures of the semantic relatedness of concepts can be categorised into two types: knowledge‑based methods and corpus‑based methods. Knowledge‑based techniques make use of man‑created dictionaries, thesauruses and other artefacts as a source of knowledge. Corpus‑based techniques assess the semantic similarity of two concepts making use of large corpora of text documents. Some researchers claim that knowledge‑based measures outperform corpus‑based ones, but it is much more important to observe that the latter ones are heavily corpus dependent. In this article, we propose to modify the best WordNet‑based method of assessing semantic relatedness, i.e. the Leacock‑Chodorow measure. This measure has proven to be the best in several studies and has a very simple formula. We asses our proposal on the basis of two popular benchmark sets of pairs of concepts, i.e. the Ruben‑Goodenough set of 65 pairs of concepts and the Fickelstein set of 353 pairs of terms. The results prove that our proposal outperforms the traditional Leacock‑Chodorow measure.

2005 ◽  
Vol 13 (9) ◽  
pp. 1105-1121 ◽  
Author(s):  
Wesley W. Chu ◽  
Zhenyu Liu ◽  
Wenlei Mao ◽  
Qinghua Zou

2021 ◽  
Vol 54 (2) ◽  
pp. 1-37
Author(s):  
Dhivya Chandrasekaran ◽  
Vijay Mago

Estimating the semantic similarity between text data is one of the challenging and open research problems in the field of Natural Language Processing (NLP). The versatility of natural language makes it difficult to define rule-based methods for determining semantic similarity measures. To address this issue, various semantic similarity methods have been proposed over the years. This survey article traces the evolution of such methods beginning from traditional NLP techniques such as kernel-based methods to the most recent research work on transformer-based models, categorizing them based on their underlying principles as knowledge-based, corpus-based, deep neural network–based methods, and hybrid methods. Discussing the strengths and weaknesses of each method, this survey provides a comprehensive view of existing systems in place for new researchers to experiment and develop innovative ideas to address the issue of semantic similarity.


Author(s):  
Saravanakumar Kandasamy ◽  
Aswani Kumar Cherukuri

Semantic similarity quantification between concepts is one of the inevitable parts in domains like Natural Language Processing, Information Retrieval, Question Answering, etc. to understand the text and their relationships better. Last few decades, many measures have been proposed by incorporating various corpus-based and knowledge-based resources. WordNet and Wikipedia are two of the Knowledge-based resources. The contribution of WordNet in the above said domain is enormous due to its richness in defining a word and all of its relationship with others. In this paper, we proposed an approach to quantify the similarity between concepts that exploits the synsets and the gloss definitions of different concepts using WordNet. Our method considers the gloss definitions, contextual words that are helping in defining a word, synsets of contextual word and the confidence of occurrence of a word in other word’s definition for calculating the similarity. The evaluation based on different gold standard benchmark datasets shows the efficiency of our system in comparison with other existing taxonomical and definitional measures.


Author(s):  
Hanane Ezzikouri ◽  
Mohammed Erritali ◽  
Mohamed Oukessou

<p>Generally utterances in natural language are highly ambiguous, and a unique interpretation can usually be determined only by taking into account the context in the utterance occurred. Automatically determining the correct sense of a polysemous word is a complicated problem especially in multilingual corpuses. This paper presents an application programming interface for several Semantic Relatedness/Similarity metrics measuring semantic  similarity/distance  between multilingual words  and  concepts, in order to use it after for sentences and paragraphs in Cross Language Plagiarism Detection (CLPD); using WordNet for the English-French and English-Arabic multilingual plagiarism cases.</p>


2018 ◽  
Vol 9 (2) ◽  
pp. 1-22 ◽  
Author(s):  
Rafiya Jan ◽  
Afaq Alam Khan

Social networks are considered as the most abundant sources of affective information for sentiment and emotion classification. Emotion classification is the challenging task of classifying emotions into different types. Emotions being universal, the automatic exploration of emotion is considered as a difficult task to perform. A lot of the research is being conducted in the field of automatic emotion detection in textual data streams. However, very little attention is paid towards capturing semantic features of the text. In this article, the authors present the technique of semantic relatedness for automatic classification of emotion in the text using distributional semantic models. This approach uses semantic similarity for measuring the coherence between the two emotionally related entities. Before classification, data is pre-processed to remove the irrelevant fields and inconsistencies and to improve the performance. The proposed approach achieved the accuracy of 71.795%, which is competitive considering as no training or annotation of data is done.


Author(s):  
Anna Lisa Gentile ◽  
Ziqi Zhang ◽  
Fabio Ciravegna

This chapter proposes a novel Semantic Relatedness (SR) measure that exploits diverse features extracted from a knowledge resource. Computing SR is a crucial technique for many complex Natural Language Processing (NLP) as well as Semantic Web related tasks. Typically, semantic relatedness measures only make use of limited number of features without considering diverse feature sets or understanding the different contributions of features to the accuracy of a method. This chapter proposes a random graph walk model based method that naturally combines diverse features extracted from a knowledge resource in a balanced way in the task of computing semantic relatedness. A set of experiments is carefully designed to investigate the effects of choosing different features and altering their weights on the accuracy of the system. Next, using the derived feature sets and feature weights we evaluate the proposed method against the state-of-the-art semantic relatedness measures, and show that it obtains higher accuracy on many benchmarking datasets. Additionally, the authors justify the usefulness of the proposed method in a practical NLP task, i.e. Named Entity Disambiguation.


2014 ◽  
Vol 875-877 ◽  
pp. 968-972
Author(s):  
Wei Yan ◽  
Cecilia Zanni-Merk ◽  
François Rousselot ◽  
Denis Cavallucci ◽  
Pierre Collet

A growing number of industries feel the need of formalizing their innovation approaches. Modern innovation theories and methods use different knowledge sources for solving inventive design problems. These sources are generally about similar notions, but the level of detail of their description can be very different. We are interested in finding semantic links among these sources and developing an intelligent way of managing this knowledge, with the goal of assisting the inventive design expert during his activities. This paper explores a short text semantic similarity approach to search potential links among these sources. These links available could facilitate the retrieval for the heuristic solutions of inventive problems for TRIZ users.


2021 ◽  
Vol 11 (6) ◽  
pp. 2567
Author(s):  
Mohammed El-Razzaz ◽  
Mohamed Waleed Fakhr ◽  
Fahima A. Maghraby

Word Sense Disambiguation (WSD) aims to predict the correct sense of a word given its context. This problem is of extreme importance in Arabic, as written words can be highly ambiguous; 43% of diacritized words have multiple interpretations and the percentage increases to 72% for non-diacritized words. Nevertheless, most Arabic written text does not have diacritical marks. Gloss-based WSD methods measure the semantic similarity or the overlap between the context of a target word that needs to be disambiguated and the dictionary definition of that word (gloss of the word). Arabic gloss WSD suffers from a lack of context-gloss datasets. In this paper, we present an Arabic gloss-based WSD technique. We utilize the celebrated Bidirectional Encoder Representation from Transformers (BERT) to build two models that can efficiently perform Arabic WSD. These models can be trained with few training samples since they utilize BERT models that were pretrained on a large Arabic corpus. Our experimental results show that our models outperform two of the most recent gloss-based WSDs when we test them against the same test data used to evaluate our model. Additionally, our model achieves an F1-score of 89% compared to the best-reported F1-score of 85% for knowledge-based Arabic WSD. Another contribution of this paper is introducing a context-gloss benchmark that may help to overcome the lack of a standardized benchmark for Arabic gloss-based WSD.


Sign in / Sign up

Export Citation Format

Share Document