AutoExtend: Combining Word Embeddings with Semantic Resources

We present AutoExtend, a system that combines word embeddings with semantic resources by learning embeddings for non-word objects like synsets and entities and learning word embeddings that incorporate the semantic information from the resource. The method is based on encoding and decoding the word embeddings and is flexible in that it can take any word embeddings as input and does not need an additional training corpus. The obtained embeddings live in the same vector space as the input word embeddings. A sparse tensor formalization guarantees efficiency and parallelizability. We use WordNet, GermaNet, and Freebase as semantic resources. AutoExtend achieves state-of-the-art performance on Word-in-Context Similarity and Word Sense Disambiguation tasks.

Download Full-text

MuLaN: Multilingual Label propagatioN for Word Sense Disambiguation

Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2020/531 ◽

2020 ◽

Author(s):

Edoardo Barba ◽

Luigi Procopio ◽

Niccolò Campolungo ◽

Tommaso Pasini ◽

Roberto Navigli

Keyword(s):

Knowledge Acquisition ◽

Knowledge Base ◽

Empirical Evidence ◽

State Of The Art ◽

Word Sense Disambiguation ◽

Label Propagation ◽

Word Sense ◽

Word Embeddings ◽

High Resource ◽

Sense Disambiguation

The knowledge acquisition bottleneck strongly affects the creation of multilingual sense-annotated data, hence limiting the power of supervised systems when applied to multilingual Word Sense Disambiguation. In this paper, we propose a semi-supervised approach based upon a novel label propagation scheme, which, by jointly leveraging contextualized word embeddings and the multilingual information enclosed in a knowledge base, projects sense labels from a high-resource language, i.e., English, to lower-resourced ones. Backed by several experiments, we provide empirical evidence that our automatically created datasets are of a higher quality than those generated by other competitors and lead a supervised model to achieve state-of-the-art performances in all multilingual Word Sense Disambiguation tasks. We make our datasets available for research purposes at https://github.com/SapienzaNLP/mulan.

Download Full-text

Spreading semantic information by Word Sense Disambiguation

Knowledge-Based Systems ◽

10.1016/j.knosys.2017.06.013 ◽

2017 ◽

Vol 132 ◽

pp. 47-61 ◽

Cited By ~ 5

Author(s):

Yoan Gutiérrez ◽

Sonia Vázquez ◽

Andrés Montoyo

Keyword(s):

Semantic Information ◽

Word Sense Disambiguation ◽

Word Sense ◽

Sense Disambiguation

Download Full-text

Graph Algorithms for Word Sense Disambiguation in Biomedicine

10.5753/sbcas.2015.10365 ◽

2015 ◽

Author(s):

Rodrigo Goulart ◽

Juliano De Carvalho ◽

Vera De Lima

Keyword(s):

Text Mining ◽

Graph Algorithms ◽

State Of The Art ◽

Word Sense Disambiguation ◽

The State ◽

Word Sense ◽

New Approach ◽

Similar Performance ◽

Sense Disambiguation ◽

Different Levels

Word Sense Disambiguation (WSD) is an important task for Biomedicine text-mining. Supervised WSD methods have the best results but they are complex and their cost for testing is too high. This work presents an experiment on WSD using graph-based approaches (unsupervised methods). Three algorithms were tested and compared to the state of the art. Results indicate that similar performance could be reached with different levels of complexity, what may point to a new approach to this problem.

Download Full-text

Word Sense Disambiguation

Emerging Applications of Natural Language Processing ◽

10.4018/978-1-4666-2169-5.ch002 ◽

2013 ◽

pp. 22-51

Author(s):

Pushpak Bhattacharyya ◽

Mitesh Khapra

Keyword(s):

State Of The Art ◽

Word Sense Disambiguation ◽

Current Trend ◽

General Purpose ◽

Word Sense ◽

Domain Specific ◽

Knowledge Based ◽

Current State ◽

Sense Disambiguation ◽

State Of Affairs

This chapter discusses the basic concepts of Word Sense Disambiguation (WSD) and the approaches to solving this problem. Both general purpose WSD and domain specific WSD are presented. The first part of the discussion focuses on existing approaches for WSD, including knowledge-based, supervised, semi-supervised, unsupervised, hybrid, and bilingual approaches. The accuracy value for general purpose WSD as the current state of affairs seems to be pegged at around 65%. This has motivated investigations into domain specific WSD, which is the current trend in the field. In the latter part of the chapter, we present a greedy neural network inspired algorithm for domain specific WSD and compare its performance with other state-of-the-art algorithms for WSD. Our experiments suggest that for domain-specific WSD, simply selecting the most frequent sense of a word does as well as any state-of-the-art algorithm.

Download Full-text

Automating Reuse of Online Semantic Resources by Concept Extraction Using Word Sense Disambiguation

Journal of Algorithms & Computational Technology ◽

10.1260/1748-3018.6.3.435 ◽

2012 ◽

Vol 6 (3) ◽

pp. 435-445

Author(s):

Nadia Imdadi ◽

Syed A.M. Rizvi

Keyword(s):

Word Sense Disambiguation ◽

Word Sense ◽

Concept Extraction ◽

Sense Disambiguation ◽

Semantic Resources

Download Full-text

A Game-Theoretic Approach to Word Sense Disambiguation

Computational Linguistics ◽

10.1162/coli_a_00274 ◽

2017 ◽

Vol 43 (1) ◽

pp. 31-70 ◽

Cited By ~ 19

Author(s):

Rocco Tripodi ◽

Marcello Pelillo

Keyword(s):

Game Theory ◽

State Of The Art ◽

Constraint Satisfaction Problem ◽

Word Sense Disambiguation ◽

Evolutionary Game ◽

Similarity Measures ◽

Theoretic Approach ◽

Word Sense ◽

Sense Disambiguation ◽

Distributional Information

This article presents a new model for word sense disambiguation formulated in terms of evolutionary game theory, where each word to be disambiguated is represented as a node on a graph whose edges represent word relations and senses are represented as classes. The words simultaneously update their class membership preferences according to the senses that neighboring words are likely to choose. We use distributional information to weigh the influence that each word has on the decisions of the others and semantic similarity information to measure the strength of compatibility among the choices. With this information we can formulate the word sense disambiguation problem as a constraint satisfaction problem and solve it using tools derived from game theory, maintaining the textual coherence. The model is based on two ideas: Similar words should be assigned to similar classes and the meaning of a word does not depend on all the words in a text but just on some of them. The article provides an in-depth motivation of the idea of modeling the word sense disambiguation problem in terms of game theory, which is illustrated by an example. The conclusion presents an extensive analysis on the combination of similarity measures to use in the framework and a comparison with state-of-the-art systems. The results show that our model outperforms state-of-the-art algorithms and can be applied to different tasks and in different scenarios.

Download Full-text

Word embeddings and recurrent neural networks based on Long-Short Term Memory nodes in supervised biomedical word sense disambiguation

Journal of Biomedical Informatics ◽

10.1016/j.jbi.2017.08.001 ◽

2017 ◽

Vol 73 ◽

pp. 137-147 ◽

Cited By ~ 19

Author(s):

Antonio Jimeno Yepes

Keyword(s):

Neural Networks ◽

Recurrent Neural Networks ◽

Short Term Memory ◽

Word Sense Disambiguation ◽

Word Sense ◽

Word Embeddings ◽

Short Term ◽

Term Memory ◽

Sense Disambiguation ◽

Long Short Term Memory

Download Full-text

A Large-Scale Pseudoword-Based Evaluation Framework for State-of-the-Art Word Sense Disambiguation

Computational Linguistics ◽

10.1162/coli_a_00202 ◽

2014 ◽

Vol 40 (4) ◽

pp. 837-881 ◽

Cited By ~ 20

Author(s):

Mohammad Taher Pilehvar ◽

Roberto Navigli

Keyword(s):

Large Scale ◽

State Of The Art ◽

Word Sense Disambiguation ◽

Evaluation Framework ◽

Small Scale ◽

Word Sense ◽

Knowledge Based ◽

Depth Analysis ◽

Sense Disambiguation ◽

The Impact

The evaluation of several tasks in lexical semantics is often limited by the lack of large amounts of manual annotations, not only for training purposes, but also for testing purposes. Word Sense Disambiguation (WSD) is a case in point, as hand-labeled datasets are particularly hard and time-consuming to create. Consequently, evaluations tend to be performed on a small scale, which does not allow for in-depth analysis of the factors that determine a systems' performance. In this paper we address this issue by means of a realistic simulation of large-scale evaluation for the WSD task. We do this by providing two main contributions: First, we put forward two novel approaches to the wide-coverage generation of semantically aware pseudowords (i.e., artificial words capable of modeling real polysemous words); second, we leverage the most suitable type of pseudoword to create large pseudosense-annotated corpora, which enable a large-scale experimental framework for the comparison of state-of-the-art supervised and knowledge-based algorithms. Using this framework, we study the impact of supervision and knowledge on the two major disambiguation paradigms and perform an in-depth analysis of the factors which affect their performance.

Download Full-text

Domain Adaptation using Word Embeddings for Word Sense Disambiguation

Journal of Natural Language Processing ◽

10.5715/jnlp.25.463 ◽

2018 ◽

Vol 25 (4) ◽

pp. 463-480

Author(s):

Kanako Komiya ◽

Minoru Sasaki ◽

Hiroyuki Shinnou ◽

Manabu Okumura

Keyword(s):

Domain Adaptation ◽

Word Sense Disambiguation ◽

Word Sense ◽

Word Embeddings ◽

Sense Disambiguation

Download Full-text

Exemplification Modeling: Can You Give Me an Example, Please?

Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2021/520 ◽

2021 ◽

Author(s):

Edoardo Barba ◽

Luigi Procopio ◽

Caterina Lacerra ◽

Tommaso Pasini ◽

Roberto Navigli

Keyword(s):

Gold Standard ◽

State Of The Art ◽

Word Sense Disambiguation ◽

Full Range ◽

Training Data ◽

Training Procedure ◽

Word Sense ◽

The Novel ◽

Current State ◽

Sense Disambiguation

Recently, generative approaches have been used effectively to provide definitions of words in their context. However, the opposite, i.e., generating a usage example given one or more words along with their definitions, has not yet been investigated. In this work, we introduce the novel task of Exemplification Modeling (ExMod), along with a sequence-to-sequence architecture and a training procedure for it. Starting from a set of (word, definition) pairs, our approach is capable of automatically generating high-quality sentences which express the requested semantics. As a result, we can drive the creation of sense-tagged data which cover the full range of meanings in any inventory of interest, and their interactions within sentences. Human annotators agree that the sentences generated are as fluent and semantically-coherent with the input definitions as the sentences in manually-annotated corpora. Indeed, when employed as training data for Word Sense Disambiguation, our examples enable the current state of the art to be outperformed, and higher results to be achieved than when using gold-standard datasets only. We release the pretrained model, the dataset and the software at https://github.com/SapienzaNLP/exmod.

Download Full-text