Selecting Training Data for Unsupervised Domain Adaptation in Word Sense Disambiguation

Active Learning to Remove Source Instances for Domain Adaptation for Word Sense Disambiguation

Communications in Computer and Information Science - Computational Linguistics ◽

10.1007/978-981-10-0515-2_7 ◽

2016 ◽

pp. 97-107 ◽

Cited By ~ 1

Author(s):

Hiroyuki Shinnou ◽

Yoshiyuki Onodera ◽

Minoru Sasaki ◽

Kanako Komiya

Keyword(s):

Active Learning ◽

Domain Adaptation ◽

Word Sense Disambiguation ◽

Word Sense ◽

Sense Disambiguation

Download Full-text

deepBioWSD: effective deep neural word sense disambiguation of biomedical text data

Journal of the American Medical Informatics Association ◽

10.1093/jamia/ocy189 ◽

2019 ◽

Vol 26 (5) ◽

pp. 438-446 ◽

Cited By ~ 3

Author(s):

Ahmad Pesaranghader ◽

Stan Matwin ◽

Marina Sokolova ◽

Ali Pesaranghader

Keyword(s):

Language Processing ◽

Short Term Memory ◽

Word Sense Disambiguation ◽

Training Data ◽

Biomedical Text ◽

Word Sense ◽

Vocabulary Size ◽

Unified Medical Language System ◽

Knowledge Based ◽

Sense Disambiguation

Abstract Objective In biomedicine, there is a wealth of information hidden in unstructured narratives such as research articles and clinical reports. To exploit these data properly, a word sense disambiguation (WSD) algorithm prevents downstream difficulties in the natural language processing applications pipeline. Supervised WSD algorithms largely outperform un- or semisupervised and knowledge-based methods; however, they train 1 separate classifier for each ambiguous term, necessitating a large number of expert-labeled training data, an unattainable goal in medical informatics. To alleviate this need, a single model that shares statistical strength across all instances and scales well with the vocabulary size is desirable. Materials and Methods Built on recent advances in deep learning, our deepBioWSD model leverages 1 single bidirectional long short-term memory network that makes sense prediction for any ambiguous term. In the model, first, the Unified Medical Language System sense embeddings will be computed using their text definitions; and then, after initializing the network with these embeddings, it will be trained on all (available) training data collectively. This method also considers a novel technique for automatic collection of training data from PubMed to (pre)train the network in an unsupervised manner. Results We use the MSH WSD dataset to compare WSD algorithms, with macro and micro accuracies employed as evaluation metrics. deepBioWSD outperforms existing models in biomedical text WSD by achieving the state-of-the-art performance of 96.82% for macro accuracy. Conclusions Apart from the disambiguation improvement and unsupervised training, deepBioWSD depends on considerably less number of expert-labeled data as it learns the target and the context terms jointly. These merit deepBioWSD to be conveniently deployable in real-time biomedical applications.

Download Full-text

Evaluating sense disambiguation across diverse parameter spaces

Natural Language Engineering ◽

10.1017/s135132490200298x ◽

2002 ◽

Vol 8 (4) ◽

pp. 293-310 ◽

Cited By ~ 45

Author(s):

DAVID YAROWSKY ◽

RADU FLORIAN

Keyword(s):

Word Sense Disambiguation ◽

Model Performance ◽

Training Data ◽

Target Language ◽

Word Sense ◽

Parameter Spaces ◽

Diverse Range ◽

Part Of Speech ◽

Sense Disambiguation ◽

Training Examples

This paper presents a comprehensive empirical exploration and evaluation of a diverse range of data characteristics which influence word sense disambiguation performance. It focuses on a set of six core supervised algorithms, including three variants of Bayesian classifiers, a cosine model, non-hierarchical decision lists, and an extension of the transformation-based learning model. Performance is investigated in detail with respect to the following parameters: (a) target language (English, Spanish, Swedish and Basque); (b) part of speech; (c) sense granularity; (d) inclusion and exclusion of major feature classes; (e) variable context width (further broken down by part-of-speech of keyword); (f) number of training examples; (g) baseline probability of the most likely sense; (h) sense distributional entropy; (i) number of senses per keyword; (j) divergence between training and test data; (k) degree of (artificially introduced) noise in the training data; (l) the effectiveness of an algorithm's confidence rankings; and (m) a full keyword breakdown of the performance of each algorithm. The paper concludes with a brief analysis of similarities, differences, strengths and weaknesses of the algorithms and a hierarchical clustering of these algorithms based on agreement of sense classification behavior. Collectively, the paper constitutes the most comprehensive survey of evaluation measures and tests yet applied to sense disambiguation algorithms. And it does so over a diverse range of supervised algorithms, languages and parameter spaces in single unified experimental framework.

Download Full-text

Domain Adaptation using Word Embeddings for Word Sense Disambiguation

Journal of Natural Language Processing ◽

10.5715/jnlp.25.463 ◽

2018 ◽

Vol 25 (4) ◽

pp. 463-480

Author(s):

Kanako Komiya ◽

Minoru Sasaki ◽

Hiroyuki Shinnou ◽

Manabu Okumura

Keyword(s):

Domain Adaptation ◽

Word Sense Disambiguation ◽

Word Sense ◽

Word Embeddings ◽

Sense Disambiguation

Download Full-text

Analysis and Evaluation of Language Models for Word Sense Disambiguation

Computational Linguistics ◽

10.1162/coli_a_00405 ◽

2021 ◽

pp. 1-55

Author(s):

Daniel Loureiro ◽

Kiamehr Rezaee ◽

Mohammad Taher Pilehvar ◽

Jose Camacho-Collados

Keyword(s):

Feature Extraction ◽

Word Sense Disambiguation ◽

Language Model ◽

Training Data ◽

Fine Tuning ◽

Language Models ◽

Coarse Grained ◽

Word Sense ◽

Sense Disambiguation ◽

High Level

Abstract Transformer-based language models have taken many fields in NLP by storm. BERT and its derivatives dominate most of the existing evaluation benchmarks, including those for Word Sense Disambiguation (WSD), thanks to their ability in capturing context-sensitive semantic nuances. However, there is still little knowledge about their capabilities and potential limitations in encoding and recovering word senses. In this article, we provide an in-depth quantitative and qualitative analysis of the celebrated BERT model with respect to lexical ambiguity. One of the main conclusions of our analysis is that BERT can accurately capture high-level sense distinctions, even when a limited number of examples is available for each word sense. Our analysis also reveals that in some cases language models come close to solving coarse-grained noun disambiguation under ideal conditions in terms of availability of training data and computing resources. However, this scenario rarely occurs in real-world settings and, hence, many practical challenges remain even in the coarse-grained setting. We also perform an in-depth comparison of the two main language model based WSD strategies, i.e., fine-tuning and feature extraction, finding that the latter approach is more robust with respect to sense bias and it can better exploit limited available training data. In fact, the simple feature extraction strategy of averaging contextualized embeddings proves robust even using only three training sentences per word sense, with minimal improvements obtained by increasing the size of this training data.

Download Full-text

Exemplification Modeling: Can You Give Me an Example, Please?

Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2021/520 ◽

2021 ◽

Author(s):

Edoardo Barba ◽

Luigi Procopio ◽

Caterina Lacerra ◽

Tommaso Pasini ◽

Roberto Navigli

Keyword(s):

Gold Standard ◽

State Of The Art ◽

Word Sense Disambiguation ◽

Full Range ◽

Training Data ◽

Training Procedure ◽

Word Sense ◽

The Novel ◽

Current State ◽

Sense Disambiguation

Recently, generative approaches have been used effectively to provide definitions of words in their context. However, the opposite, i.e., generating a usage example given one or more words along with their definitions, has not yet been investigated. In this work, we introduce the novel task of Exemplification Modeling (ExMod), along with a sequence-to-sequence architecture and a training procedure for it. Starting from a set of (word, definition) pairs, our approach is capable of automatically generating high-quality sentences which express the requested semantics. As a result, we can drive the creation of sense-tagged data which cover the full range of meanings in any inventory of interest, and their interactions within sentences. Human annotators agree that the sentences generated are as fluent and semantically-coherent with the input definitions as the sentences in manually-annotated corpora. Indeed, when employed as training data for Word Sense Disambiguation, our examples enable the current state of the art to be outperformed, and higher results to be achieved than when using gold-standard datasets only. We release the pretrained model, the dataset and the software at https://github.com/SapienzaNLP/exmod.

Download Full-text

Chinese word sense disambiguation by combining pseudo training data

International Conference on Natural Language Processing and Knowledge Engineering, 2003. Proceedings. 2003 ◽

10.1109/nlpke.2003.1275884 ◽

2004 ◽

Cited By ~ 2

Author(s):

Xiaojie Wang ◽

Y. Matsumoto

Keyword(s):

Word Sense Disambiguation ◽

Training Data ◽

Word Sense ◽

Chinese Word ◽

Sense Disambiguation

Download Full-text

Estimating class priors in domain adaptation for word sense disambiguation

10.3115/1220175.1220187 ◽

2006 ◽

Cited By ~ 31

Author(s):

Yee Seng Chan ◽

Hwee Tou Ng

Keyword(s):

Domain Adaptation ◽

Word Sense Disambiguation ◽

Word Sense ◽

Sense Disambiguation

Download Full-text

SENSE: an analogy-based Word Sense Disambiguation system

Natural Language Engineering ◽

10.1017/s135132499900217x ◽

1999 ◽

Vol 5 (2) ◽

pp. 207-218 ◽

Cited By ~ 3

Author(s):

STEFANO FEDERICI ◽

SIMONETTA MONTEMAGNI ◽

VITO PIRRELLI

Keyword(s):

Word Sense Disambiguation ◽

Training Data ◽

Word Sense ◽

Data Sparseness ◽

Sense Disambiguation ◽

Conservative Bias

The paper describes SENSE, a word sense disambiguation system which makes use of multidimensional analogy-based proportions to infer the most likely sense of a word given its context. Architecture and functioning of the system are illustrated in detail. Results of different experimental settings are given, showing that the system, in spite its conservative bias, successfully copes with the problem of training data sparseness.

Download Full-text

Parameter optimization for machine-learning of word sense disambiguation

Natural Language Engineering ◽

10.1017/s1351324902003005 ◽

2002 ◽

Vol 8 (4) ◽

pp. 311-325 ◽

Cited By ~ 24

Author(s):

V. HOSTE ◽

I. HENDRICKX ◽

W. DAELEMANS ◽

A. VAN DEN BOSCH

Keyword(s):

Machine Learning ◽

Parameter Optimization ◽

Information Sources ◽

Word Sense Disambiguation ◽

Training Data ◽

Learning Material ◽

Word Sense ◽

Performance Measurements ◽

Sense Disambiguation ◽

The Impact

Various Machine Learning (ML) approaches have been demonstrated to produce relatively successful Word Sense Disambiguation (WSD) systems. There are still unexplained differences among the performance measurements of different algorithms, hence it is warranted to deepen the investigation into which algorithm has the right ‘bias’ for this task. In this paper, we show that this is not easy to accomplish, due to intricate interactions between information sources, parameter settings, and properties of the training data. We investigate the impact of parameter optimization on generalization accuracy in a memory-based learning approach to English and Dutch WSD. A ‘word-expert’ architecture was adopted, yielding a set of classifiers, each specialized in one single wordform. The experts consist of multiple memory-based learning classifiers, each taking different information sources as input, combined in a voting scheme. We optimized the architectural and parametric settings for each individual word-expert by performing cross-validation experiments on the learning material. The results of these experiments show that the variation of both the algorithmic parameters and the information sources available to the classifiers leads to large fluctuations in accuracy. We demonstrate that optimization per word-expert leads to an overall significant improvement in the generalization accuracies of the produced WSD systems.

Download Full-text