On the Quality of Lexical Resources for Word Sense Disambiguation

Author(s):  
Lluís Màrquez ◽  
Mariona Taulé ◽  
Lluís Padró ◽  
Luis Villarejo ◽  
Maria Antònia Martí
2012 ◽  
Vol 2 (4) ◽  
Author(s):  
Adrian-Gabriel Chifu ◽  
Radu-Tudor Ionescu

AbstractSuccess in Information Retrieval (IR) depends on many variables. Several interdisciplinary approaches try to improve the quality of the results obtained by an IR system. In this paper we propose a new way of using word sense disambiguation (WSD) in IR. The method we develop is based on Naïve Bayes classification and can be used both as a filtering and as a re-ranking technique. We show on the TREC ad-hoc collection that WSD is useful in the case of queries which are difficult due to sense ambiguity. Our interest regards improving the precision after 5, 10 and 30 retrieved documents (P@5, P@10, P@30), respectively, for such lowest precision queries.


2015 ◽  
pp. 269-292 ◽  
Author(s):  
Paweł Kędzia ◽  
Maciej Piasecki ◽  
Marlena Orlińska

Word Sense Disambiguation Based on Large Scale Polish CLARIN Heterogeneous Lexical ResourcesLexical resources can be applied in many different Natural Language Engineering tasks, but the most fundamental task is the recognition of word senses used in text contexts. The problem is difficult, not yet fully solved and different lexical resources provided varied support for it. Polish CLARIN lexical semantic resources are based on the plWordNet — a very large wordnet for Polish — as a central structure which is a basis for linking together several resources of different types. In this paper, several Word Sense Disambiguation (henceforth WSD) methods developed for Polish that utilise plWordNet are discussed. Textual sense descriptions in the traditional lexicon can be compared with text contexts using Lesk’s algorithm in order to find best matching senses. In the case of a wordnet, lexico-semantic relations provide the main description of word senses. Thus, first, we adapted and applied to Polish a WSD method based on the Page Rank. According to it, text words are mapped on their senses in the plWordNet graph and Page Rank algorithm is run to find senses with the highest scores. The method presents results lower but comparable to those reported for English. The error analysis showed that the main problems are: fine grained sense distinctions in plWordNet and limited number of connections between words of different parts of speech. In the second approach plWordNet expanded with the mapping onto the SUMO ontology concepts was used. Two scenarios for WSD were investigated: two step disambiguation and disambiguation based on combined networks of plWordNet and SUMO. In the former scenario, words are first assigned SUMO concepts and next plWordNet senses are disambiguated. In latter, plWordNet and SUMO are combined in one large network used next for the disambiguation of senses. The additional knowledge sources used in WSD improved the performance. The obtained results and potential further lines of developments were discussed.


2014 ◽  
Vol 981 ◽  
pp. 153-156
Author(s):  
Chun Xiang Zhang ◽  
Long Deng ◽  
Xue Yao Gao ◽  
Li Li Guo

Word sense disambiguation is key to many application problems in natural language processing. In this paper, a specific classifier of word sense disambiguation is introduced into machine translation system in order to improve the quality of the output translation. Firstly, translation of ambiguous word is deleted from machine translation of Chinese sentence. Secondly, ambiguous word is disambiguated and the classification labels are translations of ambiguous word. Thirdly, these two translations are combined. 50 Chinese sentences including ambiguous words are collected for test experiments. Experimental results show that the translation quality is improved after the proposed method is applied.


2016 ◽  
Vol 4 ◽  
pp. 197-213 ◽  
Author(s):  
Silvana Hartmann ◽  
Judith Eckle-Kohler ◽  
Iryna Gurevych

We present a new approach for generating role-labeled training data using Linked Lexical Resources, i.e., integrated lexical resources that combine several resources (e.g., Word-Net, FrameNet, Wiktionary) by linking them on the sense or on the role level. Unlike resource-based supervision in relation extraction, we focus on complex linguistic annotations, more specifically FrameNet senses and roles. The automatically labeled training data ( www.ukp.tu-darmstadt.de/knowledge-based-srl/ ) are evaluated on four corpora from different domains for the tasks of word sense disambiguation and semantic role classification. Results show that classifiers trained on our generated data equal those resulting from a standard supervised setting.


Author(s):  
Prashant Y. Itankar ◽  
Nikhat Raza

Natural language processing (NLP) is very much needed in today’s world to enhance human-machine interaction. It is an important concern to process textual data and obtain useful and meaningful information from these texts. NLP parses the texts and provides information to machine for further processing. The present status of NLP’s computational process of identifying the meaning (sense) of a word in a particular context is ambiguous, where the meaning of word in the context is not clear and may point to multiple senses. Ambiguity in understanding correct meaning of texts is hampering the growth and development in various fields of Natural language processing applications like Machine translation, Human Machine interface etc. The process of finding the correct meaning of the ambiguous texts in the given context is called as word sense disambiguation (WSD). WSD is perceived as one of the most challenging problem in the Natural language processing community and is still unsolved. It is evident that different ambiguities exist in natural languages and researchers are contributing to resolve the problem in different languages for successful disambiguation. These ambiguities must be resolved in order to understand the meaning of the text and help to boost NLP processing and applications. Objective is to investigate how WSD can be used to alleviate ambiguities, automatically determine the correct meaning of the ambiguous text and help to boost NLP processing and applications. Resolving ambiguity for translation involves working with various natural language processing techniques to investigate the structure of the languages, availability of lexical resources etc. Word Sense Disambiguation (WSD) in the field of computing linguistics is an area which is still unsolved. This paper focus on the in-depth analysis of such ambiguity, issues in Language Translation, how WSD resolves the ambiguity and contribute towards building a framework.


2019 ◽  
Vol 55 (2) ◽  
pp. 339-365
Author(s):  
Arkadiusz Janz ◽  
Maciej Piasecki

Abstract Automatic word sense disambiguation (WSD) has proven to be an important technique in many natural language processing tasks. For many years the problem of sense disambiguation has been approached with a wide range of methods, however, it is still a challenging problem, especially in the unsupervised setting. One of the well-known and successful approaches to WSD are knowledge-based methods leveraging lexical knowledge resources such as wordnets. As the knowledge-based approaches mostly do not use any labelled training data their performance strongly relies on the structure and the quality of used knowledge sources. However, a pure knowledge-base such as a wordnet cannot reflect all the semantic knowledge necessary to correctly disambiguate word senses in text. In this paper we explore various expansions to plWordNet as knowledge-bases for WSD. Semantic links extracted from a large valency lexicon (Walenty), glosses and usage examples, Wikipedia articles and SUMO ontology are combined with plWordNet and tested in a PageRank-based WSD algorithm. In addition, we analyse also the influence of lexical semantics vector models extracted with the help of the distributional semantics methods. Several new Polish test data sets for WSD are also introduced. All the resources, methods and tools are available on open licences.


2020 ◽  
Vol 20 (4) ◽  
pp. 90-107
Author(s):  
Bolshina Angelina ◽  
Natalia Loukachevitch

AbstractThe limited amount of the sense annotated data is a big challenge for the word sense disambiguation task. As a solution to this problem, we propose an algorithm of automatic generation and labelling of the training collections based on the monosemous relatives concept. In this article we explore the limits of this algorithm: we employ it to harvest training collections for all ambiguous nouns, verbs and adjectives presented in RuWordNet thesaurus and then evaluate the quality of the obtained collections. We demonstrate that our approach can create high-quality labelled collections with almost full-coverage of the RuWordNet polysemous words. Furthermore, we show that our method can be applied to the Word-in-Context task.


Sign in / Sign up

Export Citation Format

Share Document