scholarly journals Word Sense Disambiguation Studio: A Flexible System for WSD Feature Extraction

Information ◽  
2019 ◽  
Vol 10 (3) ◽  
pp. 97
Author(s):  
Gennady Agre ◽  
Daniel Petrov ◽  
Simona Keskinova

The paper presents a flexible system for extracting features and creating training and testexamples for solving the all-words sense disambiguation (WSD) task. The system allowsintegrating word and sense embeddings as part of an example description. The system possessestwo unique features distinguishing it from all similar WSD systems—the ability to construct aspecial compressed representation for word embeddings and the ability to construct training andtest sets of examples with different data granularity. The first feature allows generation of data setswith quite small dimensionality, which can be used for training highly accurate classifiers ofdifferent types. The second feature allows generating sets of examples that can be used for trainingclassifiers specialized in disambiguating a concrete word, words belonging to the samepart-of-speech (POS) category or all open class words. Intensive experimentation has shown thatclassifiers trained on examples created by the system outperform the standard baselines formeasuring the behaviour of all-words WSD classifiers.


2017 ◽  
Vol 43 (3) ◽  
pp. 593-617 ◽  
Author(s):  
Sascha Rothe ◽  
Hinrich Schütze

We present AutoExtend, a system that combines word embeddings with semantic resources by learning embeddings for non-word objects like synsets and entities and learning word embeddings that incorporate the semantic information from the resource. The method is based on encoding and decoding the word embeddings and is flexible in that it can take any word embeddings as input and does not need an additional training corpus. The obtained embeddings live in the same vector space as the input word embeddings. A sparse tensor formalization guarantees efficiency and parallelizability. We use WordNet, GermaNet, and Freebase as semantic resources. AutoExtend achieves state-of-the-art performance on Word-in-Context Similarity and Word Sense Disambiguation tasks.



2018 ◽  
Vol 25 (4) ◽  
pp. 463-480
Author(s):  
Kanako Komiya ◽  
Minoru Sasaki ◽  
Hiroyuki Shinnou ◽  
Manabu Okumura


2021 ◽  
pp. 1-55
Author(s):  
Daniel Loureiro ◽  
Kiamehr Rezaee ◽  
Mohammad Taher Pilehvar ◽  
Jose Camacho-Collados

Abstract Transformer-based language models have taken many fields in NLP by storm. BERT and its derivatives dominate most of the existing evaluation benchmarks, including those for Word Sense Disambiguation (WSD), thanks to their ability in capturing context-sensitive semantic nuances. However, there is still little knowledge about their capabilities and potential limitations in encoding and recovering word senses. In this article, we provide an in-depth quantitative and qualitative analysis of the celebrated BERT model with respect to lexical ambiguity. One of the main conclusions of our analysis is that BERT can accurately capture high-level sense distinctions, even when a limited number of examples is available for each word sense. Our analysis also reveals that in some cases language models come close to solving coarse-grained noun disambiguation under ideal conditions in terms of availability of training data and computing resources. However, this scenario rarely occurs in real-world settings and, hence, many practical challenges remain even in the coarse-grained setting. We also perform an in-depth comparison of the two main language model based WSD strategies, i.e., fine-tuning and feature extraction, finding that the latter approach is more robust with respect to sense bias and it can better exploit limited available training data. In fact, the simple feature extraction strategy of averaging contextualized embeddings proves robust even using only three training sentences per word sense, with minimal improvements obtained by increasing the size of this training data.



2017 ◽  
Vol 14 (4) ◽  
Author(s):  
Rui Antunes ◽  
Sérgio Matos

AbstractWord sense disambiguation (WSD) is an important step in biomedical text mining, which is responsible for assigning an unequivocal concept to an ambiguous term, improving the accuracy of biomedical information extraction systems. In this work we followed supervised and knowledge-based disambiguation approaches, with the best results obtained by supervised means. In the supervised method we used bag-of-words as local features, and word embeddings as global features. In the knowledge-based method we combined word embeddings, concept textual definitions extracted from the UMLS database, and concept association values calculated from the MeSH co-occurrence counts from MEDLINE articles. Also, in the knowledge-based method, we tested different word embedding averaging functions to calculate the surrounding context vectors, with the goal to give more importance to closest words of the ambiguous term. The MSH WSD dataset, the most common dataset used for evaluating biomedical concept disambiguation, was used to evaluate our methods. We obtained a top accuracy of 95.6 % by supervised means, while the best knowledge-based accuracy was 87.4 %. Our results show that word embedding models improved the disambiguation accuracy, proving to be a powerful resource in the WSD task.



2020 ◽  
Vol 34 (10) ◽  
pp. 13823-13824
Author(s):  
Xinyi Jiang ◽  
Zhengzhe Yang ◽  
Jinho D. Choi

We present a novel online algorithm that learns the essence of each dimension in word embeddings. We first mask dimensions determined unessential by our algorithm, apply the masked word embeddings to a word sense disambiguation task (WSD), and compare its performance against the one achieved by the original embeddings. Our results show that the masked word embeddings do not hurt the performance and can improve it by 3%.



Author(s):  
Kanako Komiya ◽  
Shota Suzuki ◽  
Minoru Sasaki ◽  
Hiroyuki Shinnou ◽  
Manabu Okumura


Sign in / Sign up

Export Citation Format

Share Document