A Semi-Supervised Graph-based Algorithm for Word Sense Disambiguation

Word sense disambiguation is an issue of computational linguistics that aims at extracting the most appropriate sense of a word in a given context. Till date, several unsupervised graph-based methods have been devised for achieving word sense disambiguation but the majority of these methods use the notion of using multiple ambiguous words in a text corpus to create a WordNet® graph which enforces the concept of “blind leading the blind”. In this paper, a semi-supervised algorithm has been proposed and implemented that takes into consideration a clue-word for creating the desired WordNet® graph. The existing algorithms of word sense disambiguation consider all the graph connectivity measures to be equally significant but this is not the case. In this paper, a comparative study for all these graph connectivity measures is performed to discuss their connectivity aspects and priorities are assigned to them in order to generate an effective word sense disambiguation algorithm. The WordNet® graph is generated using python external libraries NetworkX and Matplotlib. The proposed algorithm’s results are tested using SemCor database and it shows considerable improvement over the unsupervised graph-based method suggested by Navigli.

Download Full-text

A Knowledge Based Word Sense Disambiguation in Telugu Language

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.a1911.1010120 ◽

2020 ◽

Vol 10 (1) ◽

pp. 440-445

Keyword(s):

Computational Linguistics ◽

Word Sense Disambiguation ◽

The Other ◽

Word Sense ◽

Knowledge Based ◽

Ambiguous Words ◽

Sense Disambiguation ◽

The Senses ◽

Definition Of ◽

Polysemous Words

Telugu (తెలుగు) is one of the Dravidian languages which are morphologically rich. As within the other languages, it too consists of ambiguous words/phrases which have one-of-a-kind meanings in special contexts. Such words are referred as polysemous words i.e. words having a couple of experiences. A Knowledge based approach is proposed for disambiguating Telugu polysemous phrases using the computational linguistics tool, IndoWordNet. The task of WSD (Word sense disambiguation) requires finding out the similarity among the target phrase and the nearby phrase. In this approach, the similarity is calculated either by means of locating out the range of similar phrases (intersection) between the glosses (definition) of the target and nearby words or by way of finding out the exact occurrence of the nearby phrase's sense in the hierarchy (hypernyms/hyponyms) of the target phrase's senses. The above parameters are changed by using the intersection use of not simplest the glosses but also by using which include the related words. Additionally, it is a third parameter 'distance' which measures the distance among the target and nearby phrases. The proposed method makes use of greater parameters for calculating similarity. It scores the senses based on the general impact of parameters i.e. intersection, hierarchy and distance, after which chooses the sense with the best score. The correct meaning of Telugu polysemous phrase could be identified with this technique.

Download Full-text

A HIGHLY ACCURATE BOOTSTRAPPING ALGORITHM FOR WORD SENSE DISAMBIGUATION

International Journal of Artificial Intelligence Tools ◽

10.1142/s0218213001000398 ◽

2001 ◽

Vol 10 (01n02) ◽

pp. 5-21 ◽

Cited By ~ 17

Author(s):

RADA F. MIHALCEA ◽

DAN I. MOLDOVAN

Keyword(s):

High Precision ◽

Word Sense Disambiguation ◽

Original Text ◽

Word Sense ◽

New Words ◽

Input Text ◽

Ambiguous Words ◽

Sense Disambiguation ◽

Very High

In this paper, we present a bootstrapping algorithm for Word Sense Disambiguation which succeeds in disambiguating a subset of the words in the input text with very high precision. It uses WordNet and a semantic tagged corpus, for the purpose of identifying the correct sense of the words in a given text. The bootstrapping process initializes a set of ambiguous words with all the nouns and verbs in the text. It then applies various disambiguation procedures and builds a set of disambiguated words: new words are sense tagged based on their relation to the already disambiguated words, and then added to the set. This process allows us to identify, in the original text, a set of words which can be disambiguated with high precision; 55% of the verbs and nouns are disambiguated with an accuracy of 92%.

Download Full-text

Unsupervised Word Sense Disambiguation Using Collocation and Measures of Graph Connectivity

International Journal of Software Engineering and Its Applications ◽

10.14257/ijseia.2015.9.5.18 ◽

2015 ◽

Vol 9 (5) ◽

pp. 183-192 ◽

Cited By ~ 1

Author(s):

Jung Gil Cho

Keyword(s):

Word Sense Disambiguation ◽

Graph Connectivity ◽

Word Sense ◽

Sense Disambiguation

Download Full-text

A Graph-based Word Sense Disambiguation Using Measures of Graph Connectivity

The Journal of Korean Institute of Information Technology ◽

10.14801/kiitr.2014.12.6.143 ◽

2014 ◽

Vol 12 (6) ◽

Cited By ~ 1

Author(s):

Jung-Gil Cho

Keyword(s):

Word Sense Disambiguation ◽

Graph Connectivity ◽

Word Sense ◽

Sense Disambiguation

Download Full-text

A comparative study between possibilistic and probabilistic approaches for monolingual word sense disambiguation

Knowledge and Information Systems ◽

10.1007/s10115-014-0753-z ◽

2014 ◽

Vol 44 (1) ◽

pp. 91-126 ◽

Cited By ~ 12

Author(s):

Bilel Elayeb ◽

Ibrahim Bounhas ◽

Oussama Ben Khiroun ◽

Fabrice Evrard ◽

Narjès Bellamine Ben Saoud

Keyword(s):

Comparative Study ◽

Word Sense Disambiguation ◽

Word Sense ◽

Sense Disambiguation

Download Full-text

Generating Sense Inventories for Ambiguous Arabic Words

The International Arab Journal of Information Technology ◽

10.34028/iajit/18/3a/8 ◽

2021 ◽

Author(s):

Marwah Alian ◽

Arafat Awajan

Keyword(s):

Word Sense Disambiguation ◽

Word Sense ◽

Arabic Word ◽

Word Sense Induction ◽

Ambiguous Words ◽

Sense Disambiguation ◽

Sentence Similarity ◽

Unsupervised Approach ◽

Similar Accuracy

The process of selecting the appropriate meaning of an ambigous word according to its context is known as word sense disambiguation. In this research, we generate a number of Arabic sense inventories based on an unsupervised approach and different pre-trained embeddings, such as Aravec, Fast text, and Arabic-News embeddings. The resulted inventories from the pre-trained embeddings are evaluated to investigate their efficiency in Arabic word sense disambiguation and sentence similarity. The sense inventories are generated using an unsupervised approach that is based on a graph-based word sense induction algorithm. Results show that the Aravec-Twitter inventory achieves the best accuracy of 0.47 for 50 neighbors and a close accuracy to the Fast text inventory for 200 neighbors while it provides similar accuracy to the Arabic-News inventory for 100neighbors. The experiment of replacing ambiguous words with their sense vectors is tested for sentence similarity using all sense inventories and the results show that using Aravec-Twitter sense inventory provides a better correlation value

Download Full-text

A new approach for unsupervised word sense disambiguation in Hindi language using graph connectivity measures

International Journal of Artificial Intelligence and Soft Computing ◽

10.1504/ijaisc.2014.065800 ◽

2014 ◽

Vol 4 (4) ◽

pp. 318 ◽

Cited By ~ 1

Author(s):

Amita Jain ◽

D.K. Lobiyal

Keyword(s):

Word Sense Disambiguation ◽

Graph Connectivity ◽

Word Sense ◽

New Approach ◽

Sense Disambiguation ◽

Hindi Language

Download Full-text

A Learning-Based Approach for Biomedical Word Sense Disambiguation

The Scientific World JOURNAL ◽

10.1100/2012/949247 ◽

2012 ◽

Vol 2012 ◽

pp. 1-8 ◽

Cited By ~ 5

Author(s):

Hisham Al-Mubaid ◽

Sandeep Gungu

Keyword(s):

Research Effort ◽

Word Sense Disambiguation ◽

Interaction Model ◽

Biomedical Domain ◽

Word Sense ◽

Text Annotation ◽

Ambiguous Words ◽

Sense Disambiguation ◽

Supervised Methods ◽

Automatic Text

In the biomedical domain, word sense ambiguity is a widely spread problem with bioinformatics research effort devoted to it being not commensurate and allowing for more development. This paper presents and evaluates a learning-based approach for sense disambiguation within the biomedical domain. The main limitation with supervised methods is the need for a corpus of manually disambiguated instances of the ambiguous words. However, the advances in automatic text annotation and tagging techniques with the help of the plethora of knowledge sources like ontologies and text literature in the biomedical domain will help lessen this limitation. The proposed method utilizes the interaction model (mutual information) between the context words and the senses of the target word to induce reliable learning models for sense disambiguation. The method has been evaluated with the benchmark dataset NLM-WSD with various settings and in biomedical entity species disambiguation. The evaluation results showed that the approach is very competitive and outperforms recently reported results of other published techniques.

Download Full-text

A Comparative Study of Open-Domain and Specific-Domain Word Sense Disambiguation Based on Quranic Information Retrieval

MATEC Web of Conferences ◽

10.1051/matecconf/201713500071 ◽

2017 ◽

Vol 135 ◽

pp. 00071 ◽

Cited By ~ 1

Author(s):

Rehab Hasan Abood ◽

Sabrina Tiun

Keyword(s):

Information Retrieval ◽

Comparative Study ◽

Word Sense Disambiguation ◽

Word Sense ◽

Open Domain ◽

Specific Domain ◽

Sense Disambiguation

Download Full-text

A Word Sense Disambiguation Method Based on Reconstruction of Context by Correlation

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.135-136.160 ◽

2011 ◽

Vol 135-136 ◽

pp. 160-166 ◽

Cited By ~ 1

Author(s):

Xin Hua Fan ◽

Bing Jun Zhang ◽

Dong Zhou

Keyword(s):

Information Entropy ◽

Average Distance ◽

Word Sense Disambiguation ◽

Ambiguous Word ◽

Experimental Results ◽

Occurrence Frequency ◽

Word Sense ◽

Occurrence Data ◽

Ambiguous Words ◽

Sense Disambiguation

This paper presents a word sense disambiguation method by reconstructing the context using the correlation between words. Firstly, we figure out the relevance between words though the statistical quantity(co-occurrence frequency , the average distance and the information entropy) from the corpus. Secondly, we see the words that have lager correlation value between ambiguous word than other words in the context as the important words, and use this kind of words to reconstruct the context, then we use the reconstructed context as the new context of the ambiguous words .In the end, we use the method of the sememe co-occurrence data[10] for word sense disambiguation. The experimental results have proved the feasibility of this method.

Download Full-text