Knowledge-based biomedical word sense disambiguation: an evaluation and application to clinical document classification

This chapter discusses the basic concepts of Word Sense Disambiguation (WSD) and the approaches to solving this problem. Both general purpose WSD and domain specific WSD are presented. The first part of the discussion focuses on existing approaches for WSD, including knowledge-based, supervised, semi-supervised, unsupervised, hybrid, and bilingual approaches. The accuracy value for general purpose WSD as the current state of affairs seems to be pegged at around 65%. This has motivated investigations into domain specific WSD, which is the current trend in the field. In the latter part of the chapter, we present a greedy neural network inspired algorithm for domain specific WSD and compare its performance with other state-of-the-art algorithms for WSD. Our experiments suggest that for domain-specific WSD, simply selecting the most frequent sense of a word does as well as any state-of-the-art algorithm.

Download Full-text

deepBioWSD: effective deep neural word sense disambiguation of biomedical text data

Journal of the American Medical Informatics Association ◽

10.1093/jamia/ocy189 ◽

2019 ◽

Vol 26 (5) ◽

pp. 438-446 ◽

Cited By ~ 3

Author(s):

Ahmad Pesaranghader ◽

Stan Matwin ◽

Marina Sokolova ◽

Ali Pesaranghader

Keyword(s):

Language Processing ◽

Short Term Memory ◽

Word Sense Disambiguation ◽

Training Data ◽

Biomedical Text ◽

Word Sense ◽

Vocabulary Size ◽

Unified Medical Language System ◽

Knowledge Based ◽

Sense Disambiguation

Abstract Objective In biomedicine, there is a wealth of information hidden in unstructured narratives such as research articles and clinical reports. To exploit these data properly, a word sense disambiguation (WSD) algorithm prevents downstream difficulties in the natural language processing applications pipeline. Supervised WSD algorithms largely outperform un- or semisupervised and knowledge-based methods; however, they train 1 separate classifier for each ambiguous term, necessitating a large number of expert-labeled training data, an unattainable goal in medical informatics. To alleviate this need, a single model that shares statistical strength across all instances and scales well with the vocabulary size is desirable. Materials and Methods Built on recent advances in deep learning, our deepBioWSD model leverages 1 single bidirectional long short-term memory network that makes sense prediction for any ambiguous term. In the model, first, the Unified Medical Language System sense embeddings will be computed using their text definitions; and then, after initializing the network with these embeddings, it will be trained on all (available) training data collectively. This method also considers a novel technique for automatic collection of training data from PubMed to (pre)train the network in an unsupervised manner. Results We use the MSH WSD dataset to compare WSD algorithms, with macro and micro accuracies employed as evaluation metrics. deepBioWSD outperforms existing models in biomedical text WSD by achieving the state-of-the-art performance of 96.82% for macro accuracy. Conclusions Apart from the disambiguation improvement and unsupervised training, deepBioWSD depends on considerably less number of expert-labeled data as it learns the target and the context terms jointly. These merit deepBioWSD to be conveniently deployable in real-time biomedical applications.

Download Full-text

An approach to knowledge-based Word Sense Disambiguation using semantic trees built on a WordNet lexicon network

2011 6th Conference on Speech Technology and Human-Computer Dialogue (SpeD) ◽

10.1109/sped.2011.5940744 ◽

2011 ◽

Author(s):

Andrei Minca ◽

Stefan Diaconescu

Keyword(s):

Word Sense Disambiguation ◽

Word Sense ◽

Knowledge Based ◽

Sense Disambiguation

Download Full-text

Knowledge Based Approaches To Nepali Word Sense Disambiguation

International Journal on Natural Language Computing ◽

10.5121/ijnlc.2014.3305 ◽

2014 ◽

Vol 3 (3) ◽

pp. 51-63 ◽

Cited By ~ 7

Author(s):

Arindam Roy ◽

Sunita Sarkar ◽

Bipul Syam Purkayastha

Keyword(s):

Word Sense Disambiguation ◽

Word Sense ◽

Knowledge Based ◽

Sense Disambiguation

Download Full-text

Structural semantic interconnections: a knowledge-based approach to word sense disambiguation

IEEE Transactions on Pattern Analysis and Machine Intelligence ◽

10.1109/tpami.2005.149 ◽

2005 ◽

Vol 27 (7) ◽

pp. 1075-1086 ◽

Cited By ~ 142

Author(s):

R. Navigli ◽

P. Velardi

Keyword(s):

Word Sense Disambiguation ◽

Word Sense ◽

Knowledge Based ◽

Sense Disambiguation

Download Full-text

A Large-Scale Pseudoword-Based Evaluation Framework for State-of-the-Art Word Sense Disambiguation

Computational Linguistics ◽

10.1162/coli_a_00202 ◽

2014 ◽

Vol 40 (4) ◽

pp. 837-881 ◽

Cited By ~ 20

Author(s):

Mohammad Taher Pilehvar ◽

Roberto Navigli

Keyword(s):

Large Scale ◽

State Of The Art ◽

Word Sense Disambiguation ◽

Evaluation Framework ◽

Small Scale ◽

Word Sense ◽

Knowledge Based ◽

Depth Analysis ◽

Sense Disambiguation ◽

The Impact

The evaluation of several tasks in lexical semantics is often limited by the lack of large amounts of manual annotations, not only for training purposes, but also for testing purposes. Word Sense Disambiguation (WSD) is a case in point, as hand-labeled datasets are particularly hard and time-consuming to create. Consequently, evaluations tend to be performed on a small scale, which does not allow for in-depth analysis of the factors that determine a systems' performance. In this paper we address this issue by means of a realistic simulation of large-scale evaluation for the WSD task. We do this by providing two main contributions: First, we put forward two novel approaches to the wide-coverage generation of semantically aware pseudowords (i.e., artificial words capable of modeling real polysemous words); second, we leverage the most suitable type of pseudoword to create large pseudosense-annotated corpora, which enable a large-scale experimental framework for the comparison of state-of-the-art supervised and knowledge-based algorithms. Using this framework, we study the impact of supervision and knowledge on the two major disambiguation paradigms and perform an in-depth analysis of the factors which affect their performance.

Download Full-text

EBL-Hope: Multilingual Word Sense Disambiguation Using a Hybrid Knowledge-Based Technique

10.18653/v1/s15-2057 ◽

2015 ◽

Author(s):

Eniafe Festus Ayetiran ◽

Guido Boella

Keyword(s):

Word Sense Disambiguation ◽

Word Sense ◽

Knowledge Based ◽

Sense Disambiguation ◽

Hybrid Knowledge

Download Full-text