A HIGHLY ACCURATE BOOTSTRAPPING ALGORITHM FOR WORD SENSE DISAMBIGUATION

In this paper, we present a bootstrapping algorithm for Word Sense Disambiguation which succeeds in disambiguating a subset of the words in the input text with very high precision. It uses WordNet and a semantic tagged corpus, for the purpose of identifying the correct sense of the words in a given text. The bootstrapping process initializes a set of ambiguous words with all the nouns and verbs in the text. It then applies various disambiguation procedures and builds a set of disambiguated words: new words are sense tagged based on their relation to the already disambiguated words, and then added to the set. This process allows us to identify, in the original text, a set of words which can be disambiguated with high precision; 55% of the verbs and nouns are disambiguated with an accuracy of 92%.

Download Full-text

The role of domain information in Word Sense Disambiguation

Natural Language Engineering ◽

10.1017/s1351324902003029 ◽

2002 ◽

Vol 8 (4) ◽

pp. 359-373 ◽

Cited By ~ 47

Author(s):

BERNARDO MAGNINI ◽

CARLO STRAPPARAVA ◽

GIOVANNI PEZZULO ◽

ALFIO GLIOZZO

Keyword(s):

Word Sense Disambiguation ◽

Semantic Relations ◽

Word Sense ◽

Sense Disambiguation ◽

High Level ◽

Word Senses ◽

Very High ◽

Domain Information

This paper explores the role of domain information in word sense disambiguation. The underlying hypothesis is that domain labels, such as MEDICINE, ARCHITECTURE and SPORT, provide a useful way to establish semantic relations among word senses, which can be profitably used during the disambiguation process. Results obtained at the SENSEVAL-2 initiative confirm that for a significant subset of words domain information can be used to disambiguate with a very high level of precision.

Download Full-text

Generating Sense Inventories for Ambiguous Arabic Words

The International Arab Journal of Information Technology ◽

10.34028/iajit/18/3a/8 ◽

2021 ◽

Author(s):

Marwah Alian ◽

Arafat Awajan

Keyword(s):

Word Sense Disambiguation ◽

Word Sense ◽

Arabic Word ◽

Word Sense Induction ◽

Ambiguous Words ◽

Sense Disambiguation ◽

Sentence Similarity ◽

Unsupervised Approach ◽

Similar Accuracy

The process of selecting the appropriate meaning of an ambigous word according to its context is known as word sense disambiguation. In this research, we generate a number of Arabic sense inventories based on an unsupervised approach and different pre-trained embeddings, such as Aravec, Fast text, and Arabic-News embeddings. The resulted inventories from the pre-trained embeddings are evaluated to investigate their efficiency in Arabic word sense disambiguation and sentence similarity. The sense inventories are generated using an unsupervised approach that is based on a graph-based word sense induction algorithm. Results show that the Aravec-Twitter inventory achieves the best accuracy of 0.47 for 50 neighbors and a close accuracy to the Fast text inventory for 200 neighbors while it provides similar accuracy to the Arabic-News inventory for 100neighbors. The experiment of replacing ambiguous words with their sense vectors is tested for sentence similarity using all sense inventories and the results show that using Aravec-Twitter sense inventory provides a better correlation value

Download Full-text

A Learning-Based Approach for Biomedical Word Sense Disambiguation

The Scientific World JOURNAL ◽

10.1100/2012/949247 ◽

2012 ◽

Vol 2012 ◽

pp. 1-8 ◽

Cited By ~ 5

Author(s):

Hisham Al-Mubaid ◽

Sandeep Gungu

Keyword(s):

Research Effort ◽

Word Sense Disambiguation ◽

Interaction Model ◽

Biomedical Domain ◽

Word Sense ◽

Text Annotation ◽

Ambiguous Words ◽

Sense Disambiguation ◽

Supervised Methods ◽

Automatic Text

In the biomedical domain, word sense ambiguity is a widely spread problem with bioinformatics research effort devoted to it being not commensurate and allowing for more development. This paper presents and evaluates a learning-based approach for sense disambiguation within the biomedical domain. The main limitation with supervised methods is the need for a corpus of manually disambiguated instances of the ambiguous words. However, the advances in automatic text annotation and tagging techniques with the help of the plethora of knowledge sources like ontologies and text literature in the biomedical domain will help lessen this limitation. The proposed method utilizes the interaction model (mutual information) between the context words and the senses of the target word to induce reliable learning models for sense disambiguation. The method has been evaluated with the benchmark dataset NLM-WSD with various settings and in biomedical entity species disambiguation. The evaluation results showed that the approach is very competitive and outperforms recently reported results of other published techniques.

Download Full-text

A Word Sense Disambiguation Method Based on Reconstruction of Context by Correlation

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.135-136.160 ◽

2011 ◽

Vol 135-136 ◽

pp. 160-166 ◽

Cited By ~ 1

Author(s):

Xin Hua Fan ◽

Bing Jun Zhang ◽

Dong Zhou

Keyword(s):

Information Entropy ◽

Average Distance ◽

Word Sense Disambiguation ◽

Ambiguous Word ◽

Experimental Results ◽

Occurrence Frequency ◽

Word Sense ◽

Occurrence Data ◽

Ambiguous Words ◽

Sense Disambiguation

This paper presents a word sense disambiguation method by reconstructing the context using the correlation between words. Firstly, we figure out the relevance between words though the statistical quantity(co-occurrence frequency , the average distance and the information entropy) from the corpus. Secondly, we see the words that have lager correlation value between ambiguous word than other words in the context as the important words, and use this kind of words to reconstruct the context, then we use the reconstructed context as the new context of the ambiguous words .In the end, we use the method of the sememe co-occurrence data[10] for word sense disambiguation. The experimental results have proved the feasibility of this method.

Download Full-text

Word Sense Disambiguation for Improving the Quality of Machine Translation

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.981.153 ◽

2014 ◽

Vol 981 ◽

pp. 153-156

Author(s):

Chun Xiang Zhang ◽

Long Deng ◽

Xue Yao Gao ◽

Li Li Guo

Keyword(s):

Machine Translation ◽

Language Processing ◽

Word Sense Disambiguation ◽

Ambiguous Word ◽

Translation System ◽

Word Sense ◽

Ambiguous Words ◽

Sense Disambiguation ◽

Machine Translation System

Word sense disambiguation is key to many application problems in natural language processing. In this paper, a specific classifier of word sense disambiguation is introduced into machine translation system in order to improve the quality of the output translation. Firstly, translation of ambiguous word is deleted from machine translation of Chinese sentence. Secondly, ambiguous word is disambiguated and the classification labels are translations of ambiguous word. Thirdly, these two translations are combined. 50 Chinese sentences including ambiguous words are collected for test experiments. Experimental results show that the translation quality is improved after the proposed method is applied.

Download Full-text

Semantic based Information Retrieval System by using WSD and DICE Coefficient

International Journal of Scientific Research in Science Engineering and Technology ◽

10.32628/ijsrset207259 ◽

2020 ◽

pp. 274-279

Author(s):

Prof Thwe ◽

Thi Thi Tun ◽

Ohnmar Aung

Keyword(s):

Information Retrieval ◽

Word Sense Disambiguation ◽

Ambiguous Word ◽

Relevant Information ◽

Word Sense ◽

Improve Performance ◽

Lexical Resource ◽

Ambiguous Words ◽

Sense Disambiguation ◽

Similarity Method

In many NLP applications such as machine translation, content analysis and information retrieval, word sense disambiguation (WSD) is an important technique. In the information retrieval (IR) system, ambiguous words are damaging effect on the precision of this system. In this situation, WSD process is useful for automatically identifying the correct meaning of an ambiguous word. Therefore, this system proposes the word sense disambiguation algorithm to increase the precision of the IR system. This system provides additional semantics as conceptually related words with the help of glosses to each keyword in the query by disambiguating their meanings. This system uses the WordNet as the lexical resource that encodes concepts of each term. In this system, various senses that are provided by WSD algorithm have been used as semantics for indexing the documents to improve performance of IR system. By using keyword and sense, this system retrieves the relevant information according to the Dice similarity method.

Download Full-text

A Semi-Supervised Graph-based Algorithm for Word Sense Disambiguation

Global Journal of Enterprise Information System ◽

10.18311/gjeis/2016/7655 ◽

2017 ◽

Vol 8 (2) ◽

pp. 13 ◽

Cited By ~ 2

Author(s):

Amita Jain ◽

Devendra Kumar Tayal ◽

Sonakshi Vij

Keyword(s):

Comparative Study ◽

Computational Linguistics ◽

Word Sense Disambiguation ◽

Considerable Improvement ◽

Graph Connectivity ◽

Word Sense ◽

Text Corpus ◽

Ambiguous Words ◽

Sense Disambiguation

Word sense disambiguation is an issue of computational linguistics that aims at extracting the most appropriate sense of a word in a given context. Till date, several unsupervised graph-based methods have been devised for achieving word sense disambiguation but the majority of these methods use the notion of using multiple ambiguous words in a text corpus to create a WordNet® graph which enforces the concept of “blind leading the blind”. In this paper, a semi-supervised algorithm has been proposed and implemented that takes into consideration a clue-word for creating the desired WordNet® graph. The existing algorithms of word sense disambiguation consider all the graph connectivity measures to be equally significant but this is not the case. In this paper, a comparative study for all these graph connectivity measures is performed to discuss their connectivity aspects and priorities are assigned to them in order to generate an effective word sense disambiguation algorithm. The WordNet® graph is generated using python external libraries NetworkX and Matplotlib. The proposed algorithm’s results are tested using SemCor database and it shows considerable improvement over the unsupervised graph-based method suggested by Navigli.

Download Full-text

Syntactic features for high precision word sense disambiguation

10.3115/1072228.1072340 ◽

2002 ◽

Cited By ~ 8

Author(s):

David Martínez ◽

Eneko Agirre ◽

Lluís Màrquez

Keyword(s):

High Precision ◽

Word Sense Disambiguation ◽

Word Sense ◽

Syntactic Features ◽

Sense Disambiguation

Download Full-text

Developing Corpora using Wikipedia and Word2vec for Word Sense Disambiguation

Indonesian Journal of Electrical Engineering and Computer Science ◽

10.11591/ijeecs.v12.i3.pp1239-1246 ◽

2018 ◽

Vol 12 (3) ◽

pp. 1239

Author(s):

Farza Nurifan ◽

Riyanarto Sarno ◽

Cahyaningtyas Sekar Wahyuni

Keyword(s):

Semantic Similarity ◽

Word Sense Disambiguation ◽

Ambiguous Word ◽

Anaphora Resolution ◽

Word Sense ◽

Accuracy Rate ◽

Ambiguous Words ◽

Sense Disambiguation ◽

Improve Accuracy ◽

Research Show

Word Sense Disambiguation (WSD) is one of the most difficult problems in the artificial intelligence field or well known as AI-hard or AI-complete. A lot of problems can be solved using word sense disambiguation approaches like sentiment analysis, machine translation, search engine relevance, coherence, anaphora resolution, and inference. In this paper, we do research to solve WSD problem with two small corpora. We propose the use of Word2vec and Wikipedia to develop the corpora. After developing the corpora, we measure the sentence similarity with the corpora using cosine similarity to determine the meaning of the ambiguous word. Lastly, to improve accuracy, we use Lesk algorithms and Wu Palmer similarity to deal with problems when there is no word from a sentence in the corpora (we call it as semantic similarity). The results of our research show an 86.94% accuracy rate and the semantic similarity improve the accuracy rate by 12.96% in determining the meaning of ambiguous words.

Download Full-text

Word Sense Disambiguation Using Prior Probability Estimation Based on the Korean WordNet

Electronics ◽

10.3390/electronics10232938 ◽

2021 ◽

Vol 10 (23) ◽

pp. 2938

Author(s):

Minho Kim ◽

Hyuk-Chul Kwon

Keyword(s):

Large Scale ◽

Prior Probability ◽

Word Sense Disambiguation ◽

Probability Estimation ◽

Word Sense ◽

Occurrence Probability ◽

Knowledge Based ◽

Ambiguous Words ◽

Lexical Disambiguation ◽

Sense Disambiguation

Supervised disambiguation using a large amount of corpus data delivers better performance than other word sense disambiguation methods. However, it is not easy to construct large-scale, sense-tagged corpora since this requires high cost and time. On the other hand, implementing unsupervised disambiguation is relatively easy, although most of the efforts have not been satisfactory. A primary reason for the performance degradation of unsupervised disambiguation is that the semantic occurrence probability of ambiguous words is not available. Hence, a data deficiency problem occurs while determining the dependency between words. This paper proposes an unsupervised disambiguation method using a prior probability estimation based on the Korean WordNet. This performs better than supervised disambiguation. In the Korean WordNet, all the words have similar semantic characteristics to their related words. Thus, it is assumed that the dependency between words is the same as the dependency between their related words. This resolves the data deficiency problem by determining the dependency between words by calculating the χ2 statistic between related words. Moreover, in order to have the same effect as using the semantic occurrence probability as prior probability, which is used in supervised disambiguation, semantically related words of ambiguous vocabulary are obtained and utilized as prior probability data. An experiment was conducted with Korean, English, and Chinese to evaluate the performance of our proposed lexical disambiguation method. We found that our proposed method had better performance than supervised disambiguation methods even though our method is based on unsupervised disambiguation (using a knowledge-based approach).

Download Full-text