Introduction to the special issue on evaluating word sense disambiguation systems

2002 ◽  
Vol 8 (4) ◽  
pp. 279-291 ◽  
Author(s):  
PHILIP EDMONDS ◽  
ADAM KILGARRIFF

Has system performance on Word Sense Disambiguation (WSD) reached a limit? Automatic systems don't perform nearly as well as humans on the task, and from the results of the SENSEVAL exercises, recent improvements in system performance appear negligible or even negative. Still, systems do perform much better than the baselines, so something is being done right. System evaluation is crucial to explain these results and to show the way forward. Indeed, the success of any project in WSD is tied to the evaluation methodology used, and especially to the formalization of the task that the systems perform. The evaluation of WSD has turned out to be as difficult as designing the systems in the first place.

2007 ◽  
Vol 33 (4) ◽  
pp. 553-590 ◽  
Author(s):  
Diana McCarthy ◽  
Rob Koeling ◽  
Julie Weeds ◽  
John Carroll

There has been a great deal of recent research into word sense disambiguation, particularly since the inception of the Senseval evaluation exercises. Because a word often has more than one meaning, resolving word sense ambiguity could benefit applications that need some level of semantic interpretation of language input. A major problem is that the accuracy of word sense disambiguation systems is strongly dependent on the quantity of manually sense-tagged data available, and even the best systems, when tagging every word token in a document, perform little better than a simple heuristic that guesses the first, or predominant, sense of a word in all contexts. The success of this heuristic is due to the skewed nature of word sense distributions. Data for the heuristic can come from either dictionaries or a sample of sense-tagged data. However, there is a limited supply of the latter, and the sense distributions and predominant sense of a word can depend on the domain or source of a document. (The first sense of “star” for example would be different in the popular press and scientific journals). In this article, we expand on a previously proposed method for determining the predominant sense of a word automatically from raw text. We look at a number of different data sources and parameterizations of the method, using evaluation results and error analyses to identify where the method performs well and also where it does not. In particular, we find that the method does not work as well for verbs and adverbs as nouns and adjectives, but produces more accurate predominant sense information than the widely used SemCor corpus for nouns with low coverage in that corpus. We further show that the method is able to adapt successfully to domains when using domain specific corpora as input and where the input can either be hand-labeled for domain or automatically classified.


2006 ◽  
Vol 12 (3) ◽  
pp. 209-228 ◽  
Author(s):  
JUDITA PREISS

We compare the word sense disambiguation systems submitted for the English-all-words task in SENSEVAL-2. We give several performance measures for the systems, and analyze correlations between system performance and word features. A decision tree learning algorithm is employed to discover the situations in which systems perform particularly well, and the resulting decision tree is examined. We investigate using a decision tree based on the SENSEVAL systems to (i) filter out senses unlikely to be correct, and to (ii) combine WSD systems. Some combinations created in this way outperform the best SENSEVAL system.


2021 ◽  
Vol 12 (1) ◽  
pp. 18-26
Author(s):  
Divya Agrawal ◽  
Ani Thomas

Natural language processing is a subfield of linguistics concerned with the interactions between computers and human language, specifically in how to program computers to process and analyze large amounts of text data (natural language data). WSD, word sense disambiguation in natural language processing, is the task of determining the correct annotation of the pun word in given context. This paper describes about the endeavor in using cosine similarity method for detection of a single homographic pun in given context, its location, and the correct annotation with respect to helping words in the context. This paper includes two approaches: BIT_SYS1 and BIT_SYS2. The first contains the words having synset count one as it cannot be pun but it can serve as helping word to the pun, and in the later words with synset count one is eliminated and the concept of helping word is abandoned. Performance of BIT_SYS2 is better than BIT_SYS1 as F1 score of BIT_SYS2(0.8571, 1.0000, 1.0000) is higher than BIT_SYS1(0.8439, 0.8648, 0.8648) in pun detection task, pun location task, and pun annotation task.


2019 ◽  
Vol 26 (4) ◽  
pp. 413-432 ◽  
Author(s):  
Goonjan Jain ◽  
D.K. Lobiyal

AbstractHumans proficiently interpret the true sense of an ambiguous word by establishing association among words in a sentence. The complete sense of text is also based on implicit information, which is not explicitly mentioned. The absence of this implicit information is a significant problem for a computer program that attempts to determine the correct sense of ambiguous words. In this paper, we propose a novel method to uncover the implicit information that links the words of a sentence. We reveal this implicit information using a graph, which is then used to disambiguate the ambiguous word. The experiments show that the proposed algorithm interprets the correct sense for both homonyms and polysemous words. Our proposed algorithm has performed better than the approaches presented in the SemEval-2013 task for word sense disambiguation and has shown an accuracy of 79.6 percent, which is 2.5 percent better than the best unsupervised approach in SemEval-2007.


Sign in / Sign up

Export Citation Format

Share Document