A Weakly supervised word sense disambiguation for Polish using rich lexical resources

Abstract Automatic word sense disambiguation (WSD) has proven to be an important technique in many natural language processing tasks. For many years the problem of sense disambiguation has been approached with a wide range of methods, however, it is still a challenging problem, especially in the unsupervised setting. One of the well-known and successful approaches to WSD are knowledge-based methods leveraging lexical knowledge resources such as wordnets. As the knowledge-based approaches mostly do not use any labelled training data their performance strongly relies on the structure and the quality of used knowledge sources. However, a pure knowledge-base such as a wordnet cannot reflect all the semantic knowledge necessary to correctly disambiguate word senses in text. In this paper we explore various expansions to plWordNet as knowledge-bases for WSD. Semantic links extracted from a large valency lexicon (Walenty), glosses and usage examples, Wikipedia articles and SUMO ontology are combined with plWordNet and tested in a PageRank-based WSD algorithm. In addition, we analyse also the influence of lexical semantics vector models extracted with the help of the distributional semantics methods. Several new Polish test data sets for WSD are also introduced. All the resources, methods and tools are available on open licences.

Download Full-text

deepBioWSD: effective deep neural word sense disambiguation of biomedical text data

Journal of the American Medical Informatics Association ◽

10.1093/jamia/ocy189 ◽

2019 ◽

Vol 26 (5) ◽

pp. 438-446 ◽

Cited By ~ 3

Author(s):

Ahmad Pesaranghader ◽

Stan Matwin ◽

Marina Sokolova ◽

Ali Pesaranghader

Keyword(s):

Language Processing ◽

Short Term Memory ◽

Word Sense Disambiguation ◽

Training Data ◽

Biomedical Text ◽

Word Sense ◽

Vocabulary Size ◽

Unified Medical Language System ◽

Knowledge Based ◽

Sense Disambiguation

Abstract Objective In biomedicine, there is a wealth of information hidden in unstructured narratives such as research articles and clinical reports. To exploit these data properly, a word sense disambiguation (WSD) algorithm prevents downstream difficulties in the natural language processing applications pipeline. Supervised WSD algorithms largely outperform un- or semisupervised and knowledge-based methods; however, they train 1 separate classifier for each ambiguous term, necessitating a large number of expert-labeled training data, an unattainable goal in medical informatics. To alleviate this need, a single model that shares statistical strength across all instances and scales well with the vocabulary size is desirable. Materials and Methods Built on recent advances in deep learning, our deepBioWSD model leverages 1 single bidirectional long short-term memory network that makes sense prediction for any ambiguous term. In the model, first, the Unified Medical Language System sense embeddings will be computed using their text definitions; and then, after initializing the network with these embeddings, it will be trained on all (available) training data collectively. This method also considers a novel technique for automatic collection of training data from PubMed to (pre)train the network in an unsupervised manner. Results We use the MSH WSD dataset to compare WSD algorithms, with macro and micro accuracies employed as evaluation metrics. deepBioWSD outperforms existing models in biomedical text WSD by achieving the state-of-the-art performance of 96.82% for macro accuracy. Conclusions Apart from the disambiguation improvement and unsupervised training, deepBioWSD depends on considerably less number of expert-labeled data as it learns the target and the context terms jointly. These merit deepBioWSD to be conveniently deployable in real-time biomedical applications.

Download Full-text

A Knowledge-Based Sense Disambiguation Method to Semantically Enhanced NL Question for Restricted Domain

Information ◽

10.3390/info12110452 ◽

2021 ◽

Vol 12 (11) ◽

pp. 452

Author(s):

Ammar Arbaaeen ◽

Asadullah Shah

Keyword(s):

Natural Language ◽

Language Processing ◽

Question Answering ◽

Word Sense Disambiguation ◽

Knowledge Bases ◽

Word Sense ◽

Intended Meaning ◽

Lexical Semantic ◽

Knowledge Based ◽

Sense Disambiguation

Within the space of question answering (QA) systems, the most critical module to improve overall performance is question analysis processing. Extracting the lexical semantic of a Natural Language (NL) question presents challenges at syntactic and semantic levels for most QA systems. This is due to the difference between the words posed by a user and the terms presently stored in the knowledge bases. Many studies have achieved encouraging results in lexical semantic resolution on the topic of word sense disambiguation (WSD), and several other works consider these challenges in the context of QA applications. Additionally, few scholars have examined the role of WSD in returning potential answers corresponding to particular questions. However, natural language processing (NLP) is still facing several challenges to determine the precise meaning of various ambiguities. Therefore, the motivation of this work is to propose a novel knowledge-based sense disambiguation (KSD) method for resolving the problem of lexical ambiguity associated with questions posed in QA systems. The major contribution is the proposed innovative method, which incorporates multiple knowledge sources. This includes the question’s metadata (date/GPS), context knowledge, and domain ontology into a shallow NLP. The proposed KSD method is developed into a unique tool for a mobile QA application that aims to determine the intended meaning of questions expressed by pilgrims. The experimental results reveal that our method obtained comparable and better accuracy performance than the baselines in the context of the pilgrimage domain.

Download Full-text

Generating Training Data for Semantic Role Labeling based on Label Transfer from Linked Lexical Resources

Transactions of the Association for Computational Linguistics ◽

10.1162/tacl_a_00093 ◽

2016 ◽

Vol 4 ◽

pp. 197-213 ◽

Cited By ~ 1

Author(s):

Silvana Hartmann ◽

Judith Eckle-Kohler ◽

Iryna Gurevych

Keyword(s):

Word Sense Disambiguation ◽

Relation Extraction ◽

Training Data ◽

Semantic Role ◽

Word Sense ◽

Semantic Role Labeling ◽

Lexical Resources ◽

New Approach ◽

Knowledge Based ◽

Sense Disambiguation

We present a new approach for generating role-labeled training data using Linked Lexical Resources, i.e., integrated lexical resources that combine several resources (e.g., Word-Net, FrameNet, Wiktionary) by linking them on the sense or on the role level. Unlike resource-based supervision in relation extraction, we focus on complex linguistic annotations, more specifically FrameNet senses and roles. The automatically labeled training data ( www.ukp.tu-darmstadt.de/knowledge-based-srl/ ) are evaluated on four corpora from different domains for the tasks of word sense disambiguation and semantic role classification. Results show that classifiers trained on our generated data equal those resulting from a standard supervised setting.

Download Full-text

Knowledge-Based Method for Word Sense Disambiguation by Using Hindi WordNet

Engineering, Technology & Applied Science Research ◽

10.48084/etasr.2596 ◽

2019 ◽

Vol 9 (2) ◽

pp. 3985-3989 ◽

Cited By ~ 1

Author(s):

P. Sharma ◽

N. Joshi

Keyword(s):

Natural Language ◽

Language Processing ◽

Speech Processing ◽

Text Processing ◽

Word Sense Disambiguation ◽

Problem Area ◽

Word Sense ◽

Knowledge Resources ◽

Knowledge Based ◽

Sense Disambiguation

The purpose of word sense disambiguation (WSD) is to find the meaning of the word in any context with the help of a computer, to find the proper meaning of a lexeme in the available context in the problem area and the relationship between lexicons. This is done using natural language processing (NLP) techniques which involve queries from machine translation (MT), NLP specific documents or output text. MT automatically translates text from one natural language into another. Several application areas for WSD involve information retrieval (IR), lexicography, MT, text processing, speech processing etc. Using this knowledge-based technique, we are investigating Hindi WSD in this article. It involves incorporating word knowledge from external knowledge resources to remove the equivocalness of words. In this experiment, we tried to develop a WSD tool by considering a knowledge-based approach with WordNet of Hindi. The tool uses the knowledge-based LESK algorithm for WSD for Hindi. Our proposed system gives an accuracy of about 71.4%.

Download Full-text

Using Exponential Kernel for Semi-Supervised Word Sense Disambiguation

Journal of Computational and Theoretical Nanoscience ◽

10.1166/jctn.2016.5649 ◽

2016 ◽

Vol 13 (10) ◽

pp. 6929-6934

Author(s):

Junting Chen ◽

Liyun Zhong ◽

Caiyun Cai

Keyword(s):

Natural Language ◽

Language Processing ◽

Word Sense Disambiguation ◽

Training Data ◽

Support Vector ◽

Svm Classifier ◽

Data Sets ◽

Word Sense ◽

Exponential Kernel ◽

Sense Disambiguation

Word sense disambiguation (WSD) in natural language text is a fundamental semantic understanding task at the lexical level in natural language processing (NLP) applications. Kernel methods such as support vector machine (SVM) have been successfully applied to WSD. This is mainly due to their relatively high classification accuracy as well as their ability to handle high dimensional and sparse data. A significant challenge in WSD is to reduce the need for labeled training data while maintaining an acceptable performance. In this paper, we present a semi-supervised technique using the exponential kernel for WSD. Specifically, the semantic similarities between terms are first determined with both labeled and unlabeled training data by means of a diffusion process on a graph defined by lexicon and co-occurrence information, and the exponential kernel is then constructed based on the learned semantic similarity. Finally, the SVM classifier trains a model for each class during the training phase and this model is then applied to all test examples in the test phase. The main feature of this approach is that it takes advantage of the exponential kernel to reveal the semantic similarities between terms in an unsupervised manner, which provides a kernel framework for semi-supervised learning. Experiments on several SENSEVAL benchmark data sets demonstrate the proposed approach is sound and effective.

Download Full-text

Ambiguity Resolution : An Analytical Study

International Journal of Scientific Research in Computer Science Engineering and Information Technology ◽

10.32628/cseit2062135 ◽

2020 ◽

pp. 471-479

Author(s):

Prashant Y. Itankar ◽

Nikhat Raza

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Language Translation ◽

Word Sense Disambiguation ◽

Word Sense ◽

Lexical Resources ◽

Human Machine Interaction ◽

Depth Analysis ◽

Sense Disambiguation

Natural language processing (NLP) is very much needed in today’s world to enhance human-machine interaction. It is an important concern to process textual data and obtain useful and meaningful information from these texts. NLP parses the texts and provides information to machine for further processing. The present status of NLP’s computational process of identifying the meaning (sense) of a word in a particular context is ambiguous, where the meaning of word in the context is not clear and may point to multiple senses. Ambiguity in understanding correct meaning of texts is hampering the growth and development in various fields of Natural language processing applications like Machine translation, Human Machine interface etc. The process of finding the correct meaning of the ambiguous texts in the given context is called as word sense disambiguation (WSD). WSD is perceived as one of the most challenging problem in the Natural language processing community and is still unsolved. It is evident that different ambiguities exist in natural languages and researchers are contributing to resolve the problem in different languages for successful disambiguation. These ambiguities must be resolved in order to understand the meaning of the text and help to boost NLP processing and applications. Objective is to investigate how WSD can be used to alleviate ambiguities, automatically determine the correct meaning of the ambiguous text and help to boost NLP processing and applications. Resolving ambiguity for translation involves working with various natural language processing techniques to investigate the structure of the languages, availability of lexical resources etc. Word Sense Disambiguation (WSD) in the field of computing linguistics is an area which is still unsolved. This paper focus on the in-depth analysis of such ambiguity, issues in Language Translation, how WSD resolves the ambiguity and contribute towards building a framework.

Download Full-text

IMPROVING WORD SENSE DISAMBIGUATION WITH AUTOMATICALLY RETRIEVED SEMANTIC KNOWLEDGE

International Journal of Semantic Computing ◽

10.1142/s1793351x08000543 ◽

2008 ◽

Vol 02 (03) ◽

pp. 365-380 ◽

Cited By ~ 1

Author(s):

DMITRIY DLIGACH ◽

MARTHA PALMER

Keyword(s):

Language Processing ◽

Word Sense Disambiguation ◽

Ambiguous Word ◽

Semantic Knowledge ◽

Word Sense ◽

Novel Approach ◽

Sense Disambiguation ◽

Dependency Parser ◽

Semantic Properties ◽

Large Corpus

Word Sense Disambiguation (WSD) is an important problem in Natural Language Processing. Supervised WSD involves assigning a sense from some sense inventory to each occurrence of an ambiguous word. Verb sense distinctions often depend on the distinctions in the semantics of the target verb's arguments. Therefore, some method of capturing their semantics is crucial to the success of a VSD system. In this paper we propose a novel approach to encoding the semantics of the noun arguments of a verb. This approach involves extracting various semantic properties of that verb from a large text corpus. We contrast our approach with the traditional methods and show that it performs better while the only resources it requires are a large corpus and a dependency parser.

Download Full-text

Random Walks for Knowledge-Based Word Sense Disambiguation

Computational Linguistics ◽

10.1162/coli_a_00164 ◽

2014 ◽

Vol 40 (1) ◽

pp. 57-84 ◽

Cited By ~ 57

Author(s):

Eneko Agirre ◽

Oier López de Lacalle ◽

Aitor Soroa

Keyword(s):

Random Walks ◽

Word Sense Disambiguation ◽

Knowledge Bases ◽

Data Sets ◽

Word Sense ◽

Lexical Knowledge ◽

Intended Meaning ◽

Data Set ◽

Knowledge Based ◽

Sense Disambiguation

Word Sense Disambiguation (WSD) systems automatically choose the intended meaning of a word in context. In this article we present a WSD algorithm based on random walks over large Lexical Knowledge Bases (LKB). We show that our algorithm performs better than other graph-based methods when run on a graph built from WordNet and eXtended WordNet. Our algorithm and LKB combination compares favorably to other knowledge-based approaches in the literature that use similar knowledge on a variety of English data sets and a data set on Spanish. We include a detailed analysis of the factors that affect the algorithm. The algorithm and the LKBs used are publicly available, and the results easily reproducible.

Download Full-text

Word Sense Disambiguation : Methods and Algorithms

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.d6696.049420 ◽

2020 ◽

Vol 9 (4) ◽

pp. 81-83

Keyword(s):

Machine Learning ◽

Information Retrieval ◽

Language Processing ◽

Word Sense Disambiguation ◽

Learning Approach ◽

Word Sense ◽

Knowledge Based ◽

External Resource ◽

Machine Learning Approach ◽

Sense Disambiguation

This paper discuss various technique of word sense disambiguation. In WSD we disambiguate the correct sense of target word present in the text. WSD is a challenging field in the natural language processing, it helps in information retrieval, information extraction, machine learning. There are two approaches for WSD machine learning approach and knowledge based approach. In Knowledge based approach a external resource is used to help in disambiguation process, but in Machine learning approach a corpus is used whether it is annotated, un-annotated or both

Download Full-text

The Knowledge Acquisition Bottleneck Problem in Multilingual Word Sense Disambiguation

Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2020/687 ◽

2020 ◽

Author(s):

Tommaso Pasini

Keyword(s):

Knowledge Acquisition ◽

Language Processing ◽

Large Scale ◽

Semantic Information ◽

Word Sense Disambiguation ◽

Knowledge Bases ◽

Word Sense ◽

Future Directions ◽

Sense Disambiguation ◽

Made In

Word Sense Disambiguation (WSD) is the task of identifying the meaning of a word in a given context. It lies at the base of Natural Language Processing as it provides semantic information for words. In the last decade, great strides have been made in this field and much effort has been devoted to mitigate the knowledge acquisition bottleneck problem, i.e., the problem of semantically annotating texts at a large scale and in different languages. This issue is ubiquitous in WSD as it hinders the creation of both multilingual knowledge bases and manually-curated training sets. In this work, we first introduce the reader to the task of WSD through a short historical digression and then take the stock of the advancements to alleviate the knowledge acquisition bottleneck problem. In that, we survey the literature on manual, semi-automatic and automatic approaches to create English and multilingual corpora tagged with sense annotations and present a clear overview over supervised models for WSD. Finally, we provide our view over the future directions that we foresee for the field.

Download Full-text