Semantic Similarity Functions in Word Sense Disambiguation

Abstract Word sense disambiguation (WSD) is the task of selecting correct sense for an ambiguous word in its context. Since WSD is one of the most challenging tasks in various text processing systems, improving its accuracy can be very beneficial. In this article, we propose a new unsupervised method based on co-occurrence graph created by monolingual corpus without any dependency on the structure and properties of the language itself. In the proposed method, the context of an ambiguous word is represented as a sub-graph extracted from a large word co-occurrence graph built based on a corpus. Most of the words are connected in this graph. To clarify the exact sense of an ambiguous word, its senses and relations are added to the context graph, and various similarity functions are employed based on the senses and context graph. In the disambiguation process, we select senses with highest similarity to the context graph. As opposite to other WSD methods, the proposed method does not use any language-dependent resources (e.g. WordNet) and it just uses a monolingual corpus. Therefore, the proposed method can be employed for other languages. Moreover, by increasing the size of corpus, it is possible to enhance the accuracy of WSD. Experimental results on English and Persian datasets show that the proposed method is competitive with existing supervised and unsupervised WSD approaches.

Download Full-text

Arabic Gloss WSD Using BERT

Applied Sciences ◽

10.3390/app11062567 ◽

2021 ◽

Vol 11 (6) ◽

pp. 2567

Author(s):

Mohammed El-Razzaz ◽

Mohamed Waleed Fakhr ◽

Fahima A. Maghraby

Keyword(s):

Target Word ◽

Semantic Similarity ◽

Test Data ◽

Word Sense Disambiguation ◽

Word Sense ◽

Written Text ◽

Knowledge Based ◽

Training Samples ◽

Sense Disambiguation ◽

Definition Of

Word Sense Disambiguation (WSD) aims to predict the correct sense of a word given its context. This problem is of extreme importance in Arabic, as written words can be highly ambiguous; 43% of diacritized words have multiple interpretations and the percentage increases to 72% for non-diacritized words. Nevertheless, most Arabic written text does not have diacritical marks. Gloss-based WSD methods measure the semantic similarity or the overlap between the context of a target word that needs to be disambiguated and the dictionary definition of that word (gloss of the word). Arabic gloss WSD suffers from a lack of context-gloss datasets. In this paper, we present an Arabic gloss-based WSD technique. We utilize the celebrated Bidirectional Encoder Representation from Transformers (BERT) to build two models that can efficiently perform Arabic WSD. These models can be trained with few training samples since they utilize BERT models that were pretrained on a large Arabic corpus. Our experimental results show that our models outperform two of the most recent gloss-based WSDs when we test them against the same test data used to evaluate our model. Additionally, our model achieves an F1-score of 89% compared to the best-reported F1-score of 85% for knowledge-based Arabic WSD. Another contribution of this paper is introducing a context-gloss benchmark that may help to overcome the lack of a standardized benchmark for Arabic gloss-based WSD.

Download Full-text

Word sense disambiguation in Tamil using Indo-WordNet and cross-language semantic similarity

International Journal of Intelligent Enterprise ◽

10.1504/ijie.2021.112320 ◽

2021 ◽

Vol 8 (1) ◽

pp. 62

Author(s):

Deepa Karuppaiah ◽

P.M. Durai Raj Vincent

Keyword(s):

Semantic Similarity ◽

Word Sense Disambiguation ◽

Word Sense ◽

Sense Disambiguation ◽

Cross Language

Download Full-text

Word Sense Disambiguation in Tamil using Indo Wordnet and Cross-Language Semantic Similarity

International Journal of Intelligent Enterprise ◽

10.1504/ijie.2020.10027864 ◽

2020 ◽

Vol 1 (1) ◽

pp. 1

Author(s):

Deepa Karuppaiah ◽

P. M. Durai Raj Vincent

Keyword(s):

Semantic Similarity ◽

Word Sense Disambiguation ◽

Word Sense ◽

Sense Disambiguation ◽

Cross Language

Download Full-text

Developing Corpora using Wikipedia and Word2vec for Word Sense Disambiguation

Indonesian Journal of Electrical Engineering and Computer Science ◽

10.11591/ijeecs.v12.i3.pp1239-1246 ◽

2018 ◽

Vol 12 (3) ◽

pp. 1239

Author(s):

Farza Nurifan ◽

Riyanarto Sarno ◽

Cahyaningtyas Sekar Wahyuni

Keyword(s):

Semantic Similarity ◽

Word Sense Disambiguation ◽

Ambiguous Word ◽

Anaphora Resolution ◽

Word Sense ◽

Accuracy Rate ◽

Ambiguous Words ◽

Sense Disambiguation ◽

Improve Accuracy ◽

Research Show

Word Sense Disambiguation (WSD) is one of the most difficult problems in the artificial intelligence field or well known as AI-hard or AI-complete. A lot of problems can be solved using word sense disambiguation approaches like sentiment analysis, machine translation, search engine relevance, coherence, anaphora resolution, and inference. In this paper, we do research to solve WSD problem with two small corpora. We propose the use of Word2vec and Wikipedia to develop the corpora. After developing the corpora, we measure the sentence similarity with the corpora using cosine similarity to determine the meaning of the ambiguous word. Lastly, to improve accuracy, we use Lesk algorithms and Wu Palmer similarity to deal with problems when there is no word from a sentence in the corpora (we call it as semantic similarity). The results of our research show an 86.94% accuracy rate and the semantic similarity improve the accuracy rate by 12.96% in determining the meaning of ambiguous words.

Download Full-text