A Semi-Supervised Graph-based Algorithm for Word Sense Disambiguation

2017 ◽  
Vol 8 (2) ◽  
pp. 13 ◽  
Author(s):  
Amita Jain ◽  
Devendra Kumar Tayal ◽  
Sonakshi Vij

Word sense disambiguation is an issue of computational linguistics that aims at extracting the most appropriate sense of a word in a given context. Till date, several unsupervised graph-based methods have been devised for achieving word sense disambiguation but the majority of these methods use the notion of using multiple ambiguous words in a text corpus to create a WordNet® graph which enforces the concept of “blind leading the blind”. In this paper, a semi-supervised algorithm has been proposed and implemented that takes into consideration a clue-word for creating the desired WordNet® graph. The existing algorithms of word sense disambiguation consider all the graph connectivity measures to be equally significant but this is not the case. In this paper, a comparative study for all these graph connectivity measures is performed to discuss their connectivity aspects and priorities are assigned to them in order to generate an effective word sense disambiguation algorithm. The WordNet® graph is generated using python external libraries NetworkX and Matplotlib. The proposed algorithm’s results are tested using SemCor database and it shows considerable improvement over the unsupervised graph-based method suggested by Navigli.

Telugu (తెలుగు) is one of the Dravidian languages which are morphologically rich. As within the other languages, it too consists of ambiguous words/phrases which have one-of-a-kind meanings in special contexts. Such words are referred as polysemous words i.e. words having a couple of experiences. A Knowledge based approach is proposed for disambiguating Telugu polysemous phrases using the computational linguistics tool, IndoWordNet. The task of WSD (Word sense disambiguation) requires finding out the similarity among the target phrase and the nearby phrase. In this approach, the similarity is calculated either by means of locating out the range of similar phrases (intersection) between the glosses (definition) of the target and nearby words or by way of finding out the exact occurrence of the nearby phrase's sense in the hierarchy (hypernyms/hyponyms) of the target phrase's senses. The above parameters are changed by using the intersection use of not simplest the glosses but also by using which include the related words. Additionally, it is a third parameter 'distance' which measures the distance among the target and nearby phrases. The proposed method makes use of greater parameters for calculating similarity. It scores the senses based on the general impact of parameters i.e. intersection, hierarchy and distance, after which chooses the sense with the best score. The correct meaning of Telugu polysemous phrase could be identified with this technique.


2001 ◽  
Vol 10 (01n02) ◽  
pp. 5-21 ◽  
Author(s):  
RADA F. MIHALCEA ◽  
DAN I. MOLDOVAN

In this paper, we present a bootstrapping algorithm for Word Sense Disambiguation which succeeds in disambiguating a subset of the words in the input text with very high precision. It uses WordNet and a semantic tagged corpus, for the purpose of identifying the correct sense of the words in a given text. The bootstrapping process initializes a set of ambiguous words with all the nouns and verbs in the text. It then applies various disambiguation procedures and builds a set of disambiguated words: new words are sense tagged based on their relation to the already disambiguated words, and then added to the set. This process allows us to identify, in the original text, a set of words which can be disambiguated with high precision; 55% of the verbs and nouns are disambiguated with an accuracy of 92%.


2014 ◽  
Vol 44 (1) ◽  
pp. 91-126 ◽  
Author(s):  
Bilel Elayeb ◽  
Ibrahim Bounhas ◽  
Oussama Ben Khiroun ◽  
Fabrice Evrard ◽  
Narjès Bellamine Ben Saoud

Author(s):  
Marwah Alian ◽  
Arafat Awajan

The process of selecting the appropriate meaning of an ambigous word according to its context is known as word sense disambiguation. In this research, we generate a number of Arabic sense inventories based on an unsupervised approach and different pre-trained embeddings, such as Aravec, Fast text, and Arabic-News embeddings. The resulted inventories from the pre-trained embeddings are evaluated to investigate their efficiency in Arabic word sense disambiguation and sentence similarity. The sense inventories are generated using an unsupervised approach that is based on a graph-based word sense induction algorithm. Results show that the Aravec-Twitter inventory achieves the best accuracy of 0.47 for 50 neighbors and a close accuracy to the Fast text inventory for 200 neighbors while it provides similar accuracy to the Arabic-News inventory for 100neighbors. The experiment of replacing ambiguous words with their sense vectors is tested for sentence similarity using all sense inventories and the results show that using Aravec-Twitter sense inventory provides a better correlation value


2012 ◽  
Vol 2012 ◽  
pp. 1-8 ◽  
Author(s):  
Hisham Al-Mubaid ◽  
Sandeep Gungu

In the biomedical domain, word sense ambiguity is a widely spread problem with bioinformatics research effort devoted to it being not commensurate and allowing for more development. This paper presents and evaluates a learning-based approach for sense disambiguation within the biomedical domain. The main limitation with supervised methods is the need for a corpus of manually disambiguated instances of the ambiguous words. However, the advances in automatic text annotation and tagging techniques with the help of the plethora of knowledge sources like ontologies and text literature in the biomedical domain will help lessen this limitation. The proposed method utilizes the interaction model (mutual information) between the context words and the senses of the target word to induce reliable learning models for sense disambiguation. The method has been evaluated with the benchmark dataset NLM-WSD with various settings and in biomedical entity species disambiguation. The evaluation results showed that the approach is very competitive and outperforms recently reported results of other published techniques.


2011 ◽  
Vol 135-136 ◽  
pp. 160-166 ◽  
Author(s):  
Xin Hua Fan ◽  
Bing Jun Zhang ◽  
Dong Zhou

This paper presents a word sense disambiguation method by reconstructing the context using the correlation between words. Firstly, we figure out the relevance between words though the statistical quantity(co-occurrence frequency , the average distance and the information entropy) from the corpus. Secondly, we see the words that have lager correlation value between ambiguous word than other words in the context as the important words, and use this kind of words to reconstruct the context, then we use the reconstructed context as the new context of the ambiguous words .In the end, we use the method of the sememe co-occurrence data[10] for word sense disambiguation. The experimental results have proved the feasibility of this method.


Sign in / Sign up

Export Citation Format

Share Document