Word sense disambiguation using implicit information

AbstractHumans proficiently interpret the true sense of an ambiguous word by establishing association among words in a sentence. The complete sense of text is also based on implicit information, which is not explicitly mentioned. The absence of this implicit information is a significant problem for a computer program that attempts to determine the correct sense of ambiguous words. In this paper, we propose a novel method to uncover the implicit information that links the words of a sentence. We reveal this implicit information using a graph, which is then used to disambiguate the ambiguous word. The experiments show that the proposed algorithm interprets the correct sense for both homonyms and polysemous words. Our proposed algorithm has performed better than the approaches presented in the SemEval-2013 task for word sense disambiguation and has shown an accuracy of 79.6 percent, which is 2.5 percent better than the best unsupervised approach in SemEval-2007.

Download Full-text

Generating Sense Inventories for Ambiguous Arabic Words

The International Arab Journal of Information Technology ◽

10.34028/iajit/18/3a/8 ◽

2021 ◽

Author(s):

Marwah Alian ◽

Arafat Awajan

Keyword(s):

Word Sense Disambiguation ◽

Word Sense ◽

Arabic Word ◽

Word Sense Induction ◽

Ambiguous Words ◽

Sense Disambiguation ◽

Sentence Similarity ◽

Unsupervised Approach ◽

Similar Accuracy

The process of selecting the appropriate meaning of an ambigous word according to its context is known as word sense disambiguation. In this research, we generate a number of Arabic sense inventories based on an unsupervised approach and different pre-trained embeddings, such as Aravec, Fast text, and Arabic-News embeddings. The resulted inventories from the pre-trained embeddings are evaluated to investigate their efficiency in Arabic word sense disambiguation and sentence similarity. The sense inventories are generated using an unsupervised approach that is based on a graph-based word sense induction algorithm. Results show that the Aravec-Twitter inventory achieves the best accuracy of 0.47 for 50 neighbors and a close accuracy to the Fast text inventory for 200 neighbors while it provides similar accuracy to the Arabic-News inventory for 100neighbors. The experiment of replacing ambiguous words with their sense vectors is tested for sentence similarity using all sense inventories and the results show that using Aravec-Twitter sense inventory provides a better correlation value

Download Full-text

Co-occurrence graph-based context adaptation: a new unsupervised approach to word sense disambiguation

Digital Scholarship in the Humanities ◽

10.1093/llc/fqz048 ◽

2020 ◽

Author(s):

Saeed Rahmani ◽

Seyed Mostafa Fakhrahmad ◽

Mohammad Hadi Sadreddini

Keyword(s):

Text Processing ◽

Word Sense Disambiguation ◽

Ambiguous Word ◽

Experimental Results ◽

Word Sense ◽

Structure And Properties ◽

Similarity Functions ◽

Challenging Tasks ◽

Sense Disambiguation ◽

Unsupervised Approach

Abstract Word sense disambiguation (WSD) is the task of selecting correct sense for an ambiguous word in its context. Since WSD is one of the most challenging tasks in various text processing systems, improving its accuracy can be very beneficial. In this article, we propose a new unsupervised method based on co-occurrence graph created by monolingual corpus without any dependency on the structure and properties of the language itself. In the proposed method, the context of an ambiguous word is represented as a sub-graph extracted from a large word co-occurrence graph built based on a corpus. Most of the words are connected in this graph. To clarify the exact sense of an ambiguous word, its senses and relations are added to the context graph, and various similarity functions are employed based on the senses and context graph. In the disambiguation process, we select senses with highest similarity to the context graph. As opposite to other WSD methods, the proposed method does not use any language-dependent resources (e.g. WordNet) and it just uses a monolingual corpus. Therefore, the proposed method can be employed for other languages. Moreover, by increasing the size of corpus, it is possible to enhance the accuracy of WSD. Experimental results on English and Persian datasets show that the proposed method is competitive with existing supervised and unsupervised WSD approaches.

Download Full-text

A Word Sense Disambiguation Method Based on Reconstruction of Context by Correlation

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.135-136.160 ◽

2011 ◽

Vol 135-136 ◽

pp. 160-166 ◽

Cited By ~ 1

Author(s):

Xin Hua Fan ◽

Bing Jun Zhang ◽

Dong Zhou

Keyword(s):

Information Entropy ◽

Average Distance ◽

Word Sense Disambiguation ◽

Ambiguous Word ◽

Experimental Results ◽

Occurrence Frequency ◽

Word Sense ◽

Occurrence Data ◽

Ambiguous Words ◽

Sense Disambiguation

This paper presents a word sense disambiguation method by reconstructing the context using the correlation between words. Firstly, we figure out the relevance between words though the statistical quantity(co-occurrence frequency , the average distance and the information entropy) from the corpus. Secondly, we see the words that have lager correlation value between ambiguous word than other words in the context as the important words, and use this kind of words to reconstruct the context, then we use the reconstructed context as the new context of the ambiguous words .In the end, we use the method of the sememe co-occurrence data[10] for word sense disambiguation. The experimental results have proved the feasibility of this method.

Download Full-text

Word Sense Disambiguation for Improving the Quality of Machine Translation

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.981.153 ◽

2014 ◽

Vol 981 ◽

pp. 153-156

Author(s):

Chun Xiang Zhang ◽

Long Deng ◽

Xue Yao Gao ◽

Li Li Guo

Keyword(s):

Machine Translation ◽

Language Processing ◽

Word Sense Disambiguation ◽

Ambiguous Word ◽

Translation System ◽

Word Sense ◽

Ambiguous Words ◽

Sense Disambiguation ◽

Machine Translation System

Word sense disambiguation is key to many application problems in natural language processing. In this paper, a specific classifier of word sense disambiguation is introduced into machine translation system in order to improve the quality of the output translation. Firstly, translation of ambiguous word is deleted from machine translation of Chinese sentence. Secondly, ambiguous word is disambiguated and the classification labels are translations of ambiguous word. Thirdly, these two translations are combined. 50 Chinese sentences including ambiguous words are collected for test experiments. Experimental results show that the translation quality is improved after the proposed method is applied.

Download Full-text

Semantic based Information Retrieval System by using WSD and DICE Coefficient

International Journal of Scientific Research in Science Engineering and Technology ◽

10.32628/ijsrset207259 ◽

2020 ◽

pp. 274-279

Author(s):

Prof Thwe ◽

Thi Thi Tun ◽

Ohnmar Aung

Keyword(s):

Information Retrieval ◽

Word Sense Disambiguation ◽

Ambiguous Word ◽

Relevant Information ◽

Word Sense ◽

Improve Performance ◽

Lexical Resource ◽

Ambiguous Words ◽

Sense Disambiguation ◽

Similarity Method

In many NLP applications such as machine translation, content analysis and information retrieval, word sense disambiguation (WSD) is an important technique. In the information retrieval (IR) system, ambiguous words are damaging effect on the precision of this system. In this situation, WSD process is useful for automatically identifying the correct meaning of an ambiguous word. Therefore, this system proposes the word sense disambiguation algorithm to increase the precision of the IR system. This system provides additional semantics as conceptually related words with the help of glosses to each keyword in the query by disambiguating their meanings. This system uses the WordNet as the lexical resource that encodes concepts of each term. In this system, various senses that are provided by WSD algorithm have been used as semantics for indexing the documents to improve performance of IR system. By using keyword and sense, this system retrieves the relevant information according to the Dice similarity method.

Download Full-text

Developing Corpora using Wikipedia and Word2vec for Word Sense Disambiguation

Indonesian Journal of Electrical Engineering and Computer Science ◽

10.11591/ijeecs.v12.i3.pp1239-1246 ◽

2018 ◽

Vol 12 (3) ◽

pp. 1239

Author(s):

Farza Nurifan ◽

Riyanarto Sarno ◽

Cahyaningtyas Sekar Wahyuni

Keyword(s):

Semantic Similarity ◽

Word Sense Disambiguation ◽

Ambiguous Word ◽

Anaphora Resolution ◽

Word Sense ◽

Accuracy Rate ◽

Ambiguous Words ◽

Sense Disambiguation ◽

Improve Accuracy ◽

Research Show

Word Sense Disambiguation (WSD) is one of the most difficult problems in the artificial intelligence field or well known as AI-hard or AI-complete. A lot of problems can be solved using word sense disambiguation approaches like sentiment analysis, machine translation, search engine relevance, coherence, anaphora resolution, and inference. In this paper, we do research to solve WSD problem with two small corpora. We propose the use of Word2vec and Wikipedia to develop the corpora. After developing the corpora, we measure the sentence similarity with the corpora using cosine similarity to determine the meaning of the ambiguous word. Lastly, to improve accuracy, we use Lesk algorithms and Wu Palmer similarity to deal with problems when there is no word from a sentence in the corpora (we call it as semantic similarity). The results of our research show an 86.94% accuracy rate and the semantic similarity improve the accuracy rate by 12.96% in determining the meaning of ambiguous words.

Download Full-text

A Knowledge Based Word Sense Disambiguation in Telugu Language

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.a1911.1010120 ◽

2020 ◽

Vol 10 (1) ◽

pp. 440-445

Keyword(s):

Computational Linguistics ◽

Word Sense Disambiguation ◽

The Other ◽

Word Sense ◽

Knowledge Based ◽

Ambiguous Words ◽

Sense Disambiguation ◽

The Senses ◽

Definition Of ◽

Polysemous Words

Telugu (తెలుగు) is one of the Dravidian languages which are morphologically rich. As within the other languages, it too consists of ambiguous words/phrases which have one-of-a-kind meanings in special contexts. Such words are referred as polysemous words i.e. words having a couple of experiences. A Knowledge based approach is proposed for disambiguating Telugu polysemous phrases using the computational linguistics tool, IndoWordNet. The task of WSD (Word sense disambiguation) requires finding out the similarity among the target phrase and the nearby phrase. In this approach, the similarity is calculated either by means of locating out the range of similar phrases (intersection) between the glosses (definition) of the target and nearby words or by way of finding out the exact occurrence of the nearby phrase's sense in the hierarchy (hypernyms/hyponyms) of the target phrase's senses. The above parameters are changed by using the intersection use of not simplest the glosses but also by using which include the related words. Additionally, it is a third parameter 'distance' which measures the distance among the target and nearby phrases. The proposed method makes use of greater parameters for calculating similarity. It scores the senses based on the general impact of parameters i.e. intersection, hierarchy and distance, after which chooses the sense with the best score. The correct meaning of Telugu polysemous phrase could be identified with this technique.

Download Full-text

Global Word Sense Disambiguation of Polysemous Words in Telugu Language

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.a1915.1010120 ◽

2020 ◽

Vol 10 (1) ◽

pp. 420-425

Keyword(s):

Language Processing ◽

Word Sense Disambiguation ◽

Text Summarization ◽

Word Sense ◽

Significant Issue ◽

Ambiguous Words ◽

Sense Disambiguation ◽

Sequencing Technique ◽

Polysemous Words ◽

Word Senses

Word Sense Disambiguation (WSD) is a significant issue in Natural Language Processing (NLP). WSD refers to the capacity of recognizing the correct sense of a word in a given context. It can improve numerous NLP applications such as machine translation, text summarization, information retrieval, or sentiment analysis. This paper proposes an approach named ShotgunWSD. Shotgun WSD is an unsupervised and knowledgebased algorithm for global word sense disambiguation. The algorithm is motivated by the Shotgun sequencing technique. Shotgun WSD is proposed to disambiguate the word senses of Telugu document with three functional phases. The Shotgun WSD achieves the better performance than other approaches of WSD in the disambiguating sense of ambiguous words in Telugu documents. The dataset is used in the Indo-WordNet.

Download Full-text

A HIGHLY ACCURATE BOOTSTRAPPING ALGORITHM FOR WORD SENSE DISAMBIGUATION

International Journal of Artificial Intelligence Tools ◽

10.1142/s0218213001000398 ◽

2001 ◽

Vol 10 (01n02) ◽

pp. 5-21 ◽

Cited By ~ 17

Author(s):

RADA F. MIHALCEA ◽

DAN I. MOLDOVAN

Keyword(s):

High Precision ◽

Word Sense Disambiguation ◽

Original Text ◽

Word Sense ◽

New Words ◽

Input Text ◽

Ambiguous Words ◽

Sense Disambiguation ◽

Very High

In this paper, we present a bootstrapping algorithm for Word Sense Disambiguation which succeeds in disambiguating a subset of the words in the input text with very high precision. It uses WordNet and a semantic tagged corpus, for the purpose of identifying the correct sense of the words in a given text. The bootstrapping process initializes a set of ambiguous words with all the nouns and verbs in the text. It then applies various disambiguation procedures and builds a set of disambiguated words: new words are sense tagged based on their relation to the already disambiguated words, and then added to the set. This process allows us to identify, in the original text, a set of words which can be disambiguated with high precision; 55% of the verbs and nouns are disambiguated with an accuracy of 92%.

Download Full-text

An Unsupervised Approach to Hindi Word Sense Disambiguation

Proceedings of the First International Conference on Intelligent Human Computer Interaction ◽

10.1007/978-81-8489-203-1_32 ◽

2009 ◽

pp. 327-335 ◽

Cited By ~ 11

Author(s):

Neetu Mishra ◽

Shashi Yadav ◽

Tanveer J. Siddiqui

Keyword(s):

Word Sense Disambiguation ◽

Word Sense ◽

Sense Disambiguation ◽

Unsupervised Approach

Download Full-text