SensEmBERT: Context-Enhanced Sense Embeddings for Multilingual Word Sense Disambiguation

Bianca Scarlini; Tommaso Pasini; Roberto Navigli

doi:10.1609/aaai.v34i05.6402

SensEmBERT: Context-Enhanced Sense Embeddings for Multilingual Word Sense Disambiguation

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i05.6402 ◽

2020 ◽

Vol 34 (05) ◽

pp. 8758-8765 ◽

Cited By ~ 2

Author(s):

Bianca Scarlini ◽

Tommaso Pasini ◽

Roberto Navigli

Keyword(s):

State Of The Art ◽

Semantic Network ◽

Word Sense Disambiguation ◽

Expressive Power ◽

English Word ◽

Language Models ◽

Word Sense ◽

Word Meanings ◽

Sense Disambiguation ◽

Amount Of Knowledge

Contextual representations of words derived by neural language models have proven to effectively encode the subtle distinctions that might occur between different meanings of the same word. However, these representations are not tied to a semantic network, hence they leave the word meanings implicit and thereby neglect the information that can be derived from the knowledge base itself. In this paper, we propose SensEmBERT, a knowledge-based approach that brings together the expressive power of language modelling and the vast amount of knowledge contained in a semantic network to produce high-quality latent semantic representations of word meanings in multiple languages. Our vectors lie in a space comparable with that of contextualized word embeddings, thus allowing a word occurrence to be easily linked to its meaning by applying a simple nearest neighbour approach.We show that, whilst not relying on manual semantic annotations, SensEmBERT is able to either achieve or surpass state-of-the-art results attained by most of the supervised neural approaches on the English Word Sense Disambiguation task. When scaling to other languages, our representations prove to be equally effective as their English counterpart and outperform the existing state of the art on all the Word Sense Disambiguation multilingual datasets. The embeddings are released in five different languages at http://sensembert.org.

Download Full-text

CSI: A Coarse Sense Inventory for 85% Word Sense Disambiguation

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i05.6324 ◽

2020 ◽

Vol 34 (05) ◽

pp. 8123-8130

Author(s):

Caterina Lacerra ◽

Michele Bevilacqua ◽

Tommaso Pasini ◽

Roberto Navigli

Keyword(s):

State Of The Art ◽

Word Sense Disambiguation ◽

Expressive Power ◽

The State ◽

Ease Of Use ◽

Word Sense ◽

The Past ◽

Sense Disambiguation ◽

Human Annotator ◽

Standing Problem

Word Sense Disambiguation (WSD) is the task of associating a word in context with one of its meanings. While many works in the past have focused on raising the state of the art, none has even come close to achieving an F-score in the 80% ballpark when using WordNet as its sense inventory. We contend that one of the main reasons for this failure is the excessively fine granularity of this inventory, resulting in senses that are hard to differentiate between, even for an experienced human annotator. In this paper we cope with this long-standing problem by introducing Coarse Sense Inventory (CSI), obtained by linking WordNet concepts to a new set of 45 labels. The results show that the coarse granularity of CSI leads a WSD model to achieve 85.9% F1, while maintaining a high expressive power. Our set of labels also exhibits ease of use in tagging and a descriptiveness that other coarse inventories lack, as demonstrated in two annotation tasks which we performed. Moreover, a few-shot evaluation proves that the class-based nature of CSI allows the model to generalise over unseen or under-represented words.

Download Full-text

Graph Algorithms for Word Sense Disambiguation in Biomedicine

10.5753/sbcas.2015.10365 ◽

2015 ◽

Author(s):

Rodrigo Goulart ◽

Juliano De Carvalho ◽

Vera De Lima

Keyword(s):

Text Mining ◽

Graph Algorithms ◽

State Of The Art ◽

Word Sense Disambiguation ◽

The State ◽

Word Sense ◽

New Approach ◽

Similar Performance ◽

Sense Disambiguation ◽

Different Levels

Word Sense Disambiguation (WSD) is an important task for Biomedicine text-mining. Supervised WSD methods have the best results but they are complex and their cost for testing is too high. This work presents an experiment on WSD using graph-based approaches (unsupervised methods). Three algorithms were tested and compared to the state of the art. Results indicate that similar performance could be reached with different levels of complexity, what may point to a new approach to this problem.

Download Full-text

Word vs. Class-Based Word Sense Disambiguation

Journal of Artificial Intelligence Research ◽

10.1613/jair.4727 ◽

2015 ◽

Vol 54 ◽

pp. 83-122 ◽

Cited By ~ 4

Author(s):

Ruben Izquierdo ◽

Armando Suarez ◽

German Rigau

Keyword(s):

Word Sense Disambiguation ◽

Coarse Grained ◽

Semantic Features ◽

Word Sense ◽

Simple Method ◽

Word Meanings ◽

Semantic Class ◽

Semantic Classes ◽

Sense Disambiguation ◽

Word Senses

As empirically demonstrated by the Word Sense Disambiguation (WSD) tasks of the last SensEval/SemEval exercises, assigning the appropriate meaning to words in context has resisted all attempts to be successfully addressed. Many authors argue that one possible reason could be the use of inappropriate sets of word meanings. In particular, WordNet has been used as a de-facto standard repository of word meanings in most of these tasks. Thus, instead of using the word senses defined in WordNet, some approaches have derived semantic classes representing groups of word senses. However, the meanings represented by WordNet have been only used for WSD at a very fine-grained sense level or at a very coarse-grained semantic class level (also called SuperSenses). We suspect that an appropriate level of abstraction could be on between both levels. The contributions of this paper are manifold. First, we propose a simple method to automatically derive semantic classes at intermediate levels of abstraction covering all nominal and verbal WordNet meanings. Second, we empirically demonstrate that our automatically derived semantic classes outperform classical approaches based on word senses and more coarse-grained sense groupings. Third, we also demonstrate that our supervised WSD system benefits from using these new semantic classes as additional semantic features while reducing the amount of training examples. Finally, we also demonstrate the robustness of our supervised semantic class-based WSD system when tested on out of domain corpus.

Download Full-text

Word Sense Disambiguation

Emerging Applications of Natural Language Processing ◽

10.4018/978-1-4666-2169-5.ch002 ◽

2013 ◽

pp. 22-51

Author(s):

Pushpak Bhattacharyya ◽

Mitesh Khapra

Keyword(s):

State Of The Art ◽

Word Sense Disambiguation ◽

Current Trend ◽

General Purpose ◽

Word Sense ◽

Domain Specific ◽

Knowledge Based ◽

Current State ◽

Sense Disambiguation ◽

State Of Affairs

This chapter discusses the basic concepts of Word Sense Disambiguation (WSD) and the approaches to solving this problem. Both general purpose WSD and domain specific WSD are presented. The first part of the discussion focuses on existing approaches for WSD, including knowledge-based, supervised, semi-supervised, unsupervised, hybrid, and bilingual approaches. The accuracy value for general purpose WSD as the current state of affairs seems to be pegged at around 65%. This has motivated investigations into domain specific WSD, which is the current trend in the field. In the latter part of the chapter, we present a greedy neural network inspired algorithm for domain specific WSD and compare its performance with other state-of-the-art algorithms for WSD. Our experiments suggest that for domain-specific WSD, simply selecting the most frequent sense of a word does as well as any state-of-the-art algorithm.

Download Full-text

AutoExtend: Combining Word Embeddings with Semantic Resources

Computational Linguistics ◽

10.1162/coli_a_00294 ◽

2017 ◽

Vol 43 (3) ◽

pp. 593-617 ◽

Cited By ~ 4

Author(s):

Sascha Rothe ◽

Hinrich Schütze

Keyword(s):

Semantic Information ◽

State Of The Art ◽

Word Sense Disambiguation ◽

Input Word ◽

Word Sense ◽

Word Embeddings ◽

Training Corpus ◽

Context Similarity ◽

Sense Disambiguation ◽

Semantic Resources

We present AutoExtend, a system that combines word embeddings with semantic resources by learning embeddings for non-word objects like synsets and entities and learning word embeddings that incorporate the semantic information from the resource. The method is based on encoding and decoding the word embeddings and is flexible in that it can take any word embeddings as input and does not need an additional training corpus. The obtained embeddings live in the same vector space as the input word embeddings. A sparse tensor formalization guarantees efficiency and parallelizability. We use WordNet, GermaNet, and Freebase as semantic resources. AutoExtend achieves state-of-the-art performance on Word-in-Context Similarity and Word Sense Disambiguation tasks.

Download Full-text

Word sense disambiguation for free-text indexing using a massive semantic network

Proceedings of the second international conference on Information and knowledge management - CIKM '93 ◽

10.1145/170088.170106 ◽

1993 ◽

Cited By ~ 131

Author(s):

Michael Sussna

Keyword(s):

Semantic Network ◽

Word Sense Disambiguation ◽

Free Text ◽

Word Sense ◽

Text Indexing ◽

Sense Disambiguation

Download Full-text

A Game-Theoretic Approach to Word Sense Disambiguation

Computational Linguistics ◽

10.1162/coli_a_00274 ◽

2017 ◽

Vol 43 (1) ◽

pp. 31-70 ◽

Cited By ~ 19

Author(s):

Rocco Tripodi ◽

Marcello Pelillo

Keyword(s):

Game Theory ◽

State Of The Art ◽

Constraint Satisfaction Problem ◽

Word Sense Disambiguation ◽

Evolutionary Game ◽

Similarity Measures ◽

Theoretic Approach ◽

Word Sense ◽

Sense Disambiguation ◽

Distributional Information

This article presents a new model for word sense disambiguation formulated in terms of evolutionary game theory, where each word to be disambiguated is represented as a node on a graph whose edges represent word relations and senses are represented as classes. The words simultaneously update their class membership preferences according to the senses that neighboring words are likely to choose. We use distributional information to weigh the influence that each word has on the decisions of the others and semantic similarity information to measure the strength of compatibility among the choices. With this information we can formulate the word sense disambiguation problem as a constraint satisfaction problem and solve it using tools derived from game theory, maintaining the textual coherence. The model is based on two ideas: Similar words should be assigned to similar classes and the meaning of a word does not depend on all the words in a text but just on some of them. The article provides an in-depth motivation of the idea of modeling the word sense disambiguation problem in terms of game theory, which is illustrated by an example. The conclusion presents an extensive analysis on the combination of similarity measures to use in the framework and a comparison with state-of-the-art systems. The results show that our model outperforms state-of-the-art algorithms and can be applied to different tasks and in different scenarios.

Download Full-text

A Large-Scale Pseudoword-Based Evaluation Framework for State-of-the-Art Word Sense Disambiguation

Computational Linguistics ◽

10.1162/coli_a_00202 ◽

2014 ◽

Vol 40 (4) ◽

pp. 837-881 ◽

Cited By ~ 20

Author(s):

Mohammad Taher Pilehvar ◽

Roberto Navigli

Keyword(s):

Large Scale ◽

State Of The Art ◽

Word Sense Disambiguation ◽

Evaluation Framework ◽

Small Scale ◽

Word Sense ◽

Knowledge Based ◽

Depth Analysis ◽

Sense Disambiguation ◽

The Impact

The evaluation of several tasks in lexical semantics is often limited by the lack of large amounts of manual annotations, not only for training purposes, but also for testing purposes. Word Sense Disambiguation (WSD) is a case in point, as hand-labeled datasets are particularly hard and time-consuming to create. Consequently, evaluations tend to be performed on a small scale, which does not allow for in-depth analysis of the factors that determine a systems' performance. In this paper we address this issue by means of a realistic simulation of large-scale evaluation for the WSD task. We do this by providing two main contributions: First, we put forward two novel approaches to the wide-coverage generation of semantically aware pseudowords (i.e., artificial words capable of modeling real polysemous words); second, we leverage the most suitable type of pseudoword to create large pseudosense-annotated corpora, which enable a large-scale experimental framework for the comparison of state-of-the-art supervised and knowledge-based algorithms. Using this framework, we study the impact of supervision and knowledge on the two major disambiguation paradigms and perform an in-depth analysis of the factors which affect their performance.

Download Full-text

Lotus at SemEval-2021 Task 2: Combination of BERT and Paraphrasing for English Word Sense Disambiguation

10.18653/v1/2021.semeval-1.95 ◽

2021 ◽

Author(s):

Niloofar Ranjbar ◽

Hossein Zeinali

Keyword(s):

Word Sense Disambiguation ◽

English Word ◽

Word Sense ◽

Sense Disambiguation

Download Full-text

Analysis and Evaluation of Language Models for Word Sense Disambiguation

Computational Linguistics ◽

10.1162/coli_a_00405 ◽

2021 ◽

pp. 1-55

Author(s):

Daniel Loureiro ◽

Kiamehr Rezaee ◽

Mohammad Taher Pilehvar ◽

Jose Camacho-Collados

Keyword(s):

Feature Extraction ◽

Word Sense Disambiguation ◽

Language Model ◽

Training Data ◽

Fine Tuning ◽

Language Models ◽

Coarse Grained ◽

Word Sense ◽

Sense Disambiguation ◽

High Level

Abstract Transformer-based language models have taken many fields in NLP by storm. BERT and its derivatives dominate most of the existing evaluation benchmarks, including those for Word Sense Disambiguation (WSD), thanks to their ability in capturing context-sensitive semantic nuances. However, there is still little knowledge about their capabilities and potential limitations in encoding and recovering word senses. In this article, we provide an in-depth quantitative and qualitative analysis of the celebrated BERT model with respect to lexical ambiguity. One of the main conclusions of our analysis is that BERT can accurately capture high-level sense distinctions, even when a limited number of examples is available for each word sense. Our analysis also reveals that in some cases language models come close to solving coarse-grained noun disambiguation under ideal conditions in terms of availability of training data and computing resources. However, this scenario rarely occurs in real-world settings and, hence, many practical challenges remain even in the coarse-grained setting. We also perform an in-depth comparison of the two main language model based WSD strategies, i.e., fine-tuning and feature extraction, finding that the latter approach is more robust with respect to sense bias and it can better exploit limited available training data. In fact, the simple feature extraction strategy of averaging contextualized embeddings proves robust even using only three training sentences per word sense, with minimal improvements obtained by increasing the size of this training data.

Download Full-text