Using Exponential Kernel for Semi-Supervised Word Sense Disambiguation

Word sense disambiguation (WSD) in natural language text is a fundamental semantic understanding task at the lexical level in natural language processing (NLP) applications. Kernel methods such as support vector machine (SVM) have been successfully applied to WSD. This is mainly due to their relatively high classification accuracy as well as their ability to handle high dimensional and sparse data. A significant challenge in WSD is to reduce the need for labeled training data while maintaining an acceptable performance. In this paper, we present a semi-supervised technique using the exponential kernel for WSD. Specifically, the semantic similarities between terms are first determined with both labeled and unlabeled training data by means of a diffusion process on a graph defined by lexicon and co-occurrence information, and the exponential kernel is then constructed based on the learned semantic similarity. Finally, the SVM classifier trains a model for each class during the training phase and this model is then applied to all test examples in the test phase. The main feature of this approach is that it takes advantage of the exponential kernel to reveal the semantic similarities between terms in an unsupervised manner, which provides a kernel framework for semi-supervised learning. Experiments on several SENSEVAL benchmark data sets demonstrate the proposed approach is sound and effective.

Download Full-text

A critical analysis and explication of word sense disambiguation as approached by natural language processing

Lingua ◽

10.1016/j.lingua.2020.102896 ◽

2020 ◽

Vol 243 ◽

pp. 102896

Author(s):

Julie Mennes ◽

Stephan van der Waart van Gulik

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Critical Analysis ◽

Word Sense Disambiguation ◽

Word Sense ◽

Sense Disambiguation

Download Full-text

deepBioWSD: effective deep neural word sense disambiguation of biomedical text data

Journal of the American Medical Informatics Association ◽

10.1093/jamia/ocy189 ◽

2019 ◽

Vol 26 (5) ◽

pp. 438-446 ◽

Cited By ~ 3

Author(s):

Ahmad Pesaranghader ◽

Stan Matwin ◽

Marina Sokolova ◽

Ali Pesaranghader

Keyword(s):

Language Processing ◽

Short Term Memory ◽

Word Sense Disambiguation ◽

Training Data ◽

Biomedical Text ◽

Word Sense ◽

Vocabulary Size ◽

Unified Medical Language System ◽

Knowledge Based ◽

Sense Disambiguation

Abstract Objective In biomedicine, there is a wealth of information hidden in unstructured narratives such as research articles and clinical reports. To exploit these data properly, a word sense disambiguation (WSD) algorithm prevents downstream difficulties in the natural language processing applications pipeline. Supervised WSD algorithms largely outperform un- or semisupervised and knowledge-based methods; however, they train 1 separate classifier for each ambiguous term, necessitating a large number of expert-labeled training data, an unattainable goal in medical informatics. To alleviate this need, a single model that shares statistical strength across all instances and scales well with the vocabulary size is desirable. Materials and Methods Built on recent advances in deep learning, our deepBioWSD model leverages 1 single bidirectional long short-term memory network that makes sense prediction for any ambiguous term. In the model, first, the Unified Medical Language System sense embeddings will be computed using their text definitions; and then, after initializing the network with these embeddings, it will be trained on all (available) training data collectively. This method also considers a novel technique for automatic collection of training data from PubMed to (pre)train the network in an unsupervised manner. Results We use the MSH WSD dataset to compare WSD algorithms, with macro and micro accuracies employed as evaluation metrics. deepBioWSD outperforms existing models in biomedical text WSD by achieving the state-of-the-art performance of 96.82% for macro accuracy. Conclusions Apart from the disambiguation improvement and unsupervised training, deepBioWSD depends on considerably less number of expert-labeled data as it learns the target and the context terms jointly. These merit deepBioWSD to be conveniently deployable in real-time biomedical applications.

Download Full-text

Comparing supervised learning algorithms for Spatial Nominal Entity recognition

AGILE: GIScience Series ◽

10.5194/agile-giss-1-15-2020 ◽

2020 ◽

Vol 1 ◽

pp. 1-18

Author(s):

Amine Medad ◽

Mauro Gaio ◽

Ludovic Moncla ◽

Sébastien Mustière ◽

Yannick Le Nir

Keyword(s):

Natural Language ◽

Supervised Learning ◽

Language Processing ◽

Word Sense Disambiguation ◽

Learning Algorithms ◽

Named Entity Recognition ◽

Entity Recognition ◽

Word Sense ◽

Sense Disambiguation ◽

Supervised Learning Algorithms

Abstract. Discourse may contain both named and nominal entities. Most common nouns or nominal mentions in natural language do not have a single, simple meaning but rather a number of related meanings. This form of ambiguity led to the development of a task in natural language processing known as Word Sense Disambiguation. Recognition and categorisation of named and nominal entities is an essential step for Word Sense Disambiguation methods. Up to now, named entity recognition and categorisation systems mainly focused on the annotation, categorisation and identification of named entities. This paper focuses on the annotation and the identification of spatial nominal entities. We explore the combination of Transfer Learning principle and supervised learning algorithms, in order to build a system to detect spatial nominal entities. For this purpose, different supervised learning algorithms are evaluated with three different context sizes on two manually annotated datasets built from Wikipedia articles and hiking description texts. The studied algorithms have been selected for one or more of their specific properties potentially useful in solving our problem. The results of the first phase of experiments reveal that the selected algorithms have similar performances in terms of ability to detect spatial nominal entities. The study also confirms the importance of the size of the window to describe the context, when word-embedding principle is used to represent the semantics of each word.

Download Full-text

Knowledge-Based Method for Word Sense Disambiguation by Using Hindi WordNet

Engineering, Technology & Applied Science Research ◽

10.48084/etasr.2596 ◽

2019 ◽

Vol 9 (2) ◽

pp. 3985-3989 ◽

Cited By ~ 1

Author(s):

P. Sharma ◽

N. Joshi

Keyword(s):

Natural Language ◽

Language Processing ◽

Speech Processing ◽

Text Processing ◽

Word Sense Disambiguation ◽

Problem Area ◽

Word Sense ◽

Knowledge Resources ◽

Knowledge Based ◽

Sense Disambiguation

The purpose of word sense disambiguation (WSD) is to find the meaning of the word in any context with the help of a computer, to find the proper meaning of a lexeme in the available context in the problem area and the relationship between lexicons. This is done using natural language processing (NLP) techniques which involve queries from machine translation (MT), NLP specific documents or output text. MT automatically translates text from one natural language into another. Several application areas for WSD involve information retrieval (IR), lexicography, MT, text processing, speech processing etc. Using this knowledge-based technique, we are investigating Hindi WSD in this article. It involves incorporating word knowledge from external knowledge resources to remove the equivocalness of words. In this experiment, we tried to develop a WSD tool by considering a knowledge-based approach with WordNet of Hindi. The tool uses the knowledge-based LESK algorithm for WSD for Hindi. Our proposed system gives an accuracy of about 71.4%.

Download Full-text

Normalized Statistical Algorithm for Afaan Oromo Word Sense Disambiguation

International Journal of Intelligent Systems and Applications ◽

10.5815/ijisa.2021.06.04 ◽

2021 ◽

Vol 13 (6) ◽

pp. 40-50

Author(s):

Abdo Ababor Abafogi ◽

Keyword(s):

Natural Language ◽

Language Processing ◽

Morphological Analysis ◽

Word Sense Disambiguation ◽

Word Sense ◽

Statistical Algorithm ◽

Sense Disambiguation ◽

F Measure ◽

Overall Effectiveness ◽

Standing Problem

Language is the main means of communication used by human. In various situations, the same word can mean differently based on the usage of the word in a particular sentence which is challenging for a computer to understand as level of human. Word Sense Disambiguation (WSD), which aims to identify correct sense of a given ambiguity word, is a long-standing problem in natural language processing (NLP). As the major aim of WSD is to accurately understand the sense of a word in particular context, can be used for the correct labeling of words in natural language applications. In this paper, I propose a normalized statistical algorithm that performs the task of WSD for Afaan Oromo language despite morphological analysis The propose algorithm has the power to discriminate ambiguous word’s sense without windows size consideration, without predefined rule and without utilize annotated dataset for training which minimize a challenge of under resource languages. The proposed system tested on 249 sentences with precision, recall, and F-measure. The overall effectiveness of the system is 80.76% in F-measure, which implies that the proposed system is promising on Afaan Oromo that is one of under resource languages spoken in East Africa. The algorithm can be extended for semantic text similarity without modification or with a bit modification. Furthermore, the forwarded direction can improve the performance of the proposed algorithm.

Download Full-text

A Knowledge-Based Sense Disambiguation Method to Semantically Enhanced NL Question for Restricted Domain

Information ◽

10.3390/info12110452 ◽

2021 ◽

Vol 12 (11) ◽

pp. 452

Author(s):

Ammar Arbaaeen ◽

Asadullah Shah

Keyword(s):

Natural Language ◽

Language Processing ◽

Question Answering ◽

Word Sense Disambiguation ◽

Knowledge Bases ◽

Word Sense ◽

Intended Meaning ◽

Lexical Semantic ◽

Knowledge Based ◽

Sense Disambiguation

Within the space of question answering (QA) systems, the most critical module to improve overall performance is question analysis processing. Extracting the lexical semantic of a Natural Language (NL) question presents challenges at syntactic and semantic levels for most QA systems. This is due to the difference between the words posed by a user and the terms presently stored in the knowledge bases. Many studies have achieved encouraging results in lexical semantic resolution on the topic of word sense disambiguation (WSD), and several other works consider these challenges in the context of QA applications. Additionally, few scholars have examined the role of WSD in returning potential answers corresponding to particular questions. However, natural language processing (NLP) is still facing several challenges to determine the precise meaning of various ambiguities. Therefore, the motivation of this work is to propose a novel knowledge-based sense disambiguation (KSD) method for resolving the problem of lexical ambiguity associated with questions posed in QA systems. The major contribution is the proposed innovative method, which incorporates multiple knowledge sources. This includes the question’s metadata (date/GPS), context knowledge, and domain ontology into a shallow NLP. The proposed KSD method is developed into a unique tool for a mobile QA application that aims to determine the intended meaning of questions expressed by pilgrims. The experimental results reveal that our method obtained comparable and better accuracy performance than the baselines in the context of the pilgrimage domain.

Download Full-text

A COMPARATIVE STUDY OF STATISTICAL AND NATURAL LANGUAGE PROCESSING TECHNIQUES FOR SENTIMENT ANALYSIS

Jurnal Teknologi ◽

10.11113/jt.v77.6502 ◽

2015 ◽

Vol 77 (18) ◽

Cited By ~ 1

Author(s):

Wai-Howe Khong ◽

Lay-Ki Soon ◽

Hui-Ngo Goh

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Sentiment Analysis ◽

Language Processing ◽

Word Sense Disambiguation ◽

Statistical Technique ◽

Machine Learning Algorithms ◽

Support Vector ◽

Word Sense ◽

Pos Tagging

Sentiment analysis has emerged as one of the most powerful tools in business intelligence. With the aim of proposing an effective sentiment analysis technique, we have performed experiments on analyzing the sentiments of 3,424 tweets using both statistical and natural language processing (NLP) techniques as part of our background study. For statistical technique, machine learning algorithms such as Support Vector Machines (SVMs), decision trees and Naïve Bayes have been explored. The results show that SVM consistently outperformed the rest in both classifications. As for sentiment analysis using NLP techniques, we used two different tagging methods for part-of-speech (POS) tagging. Subsequently, the output is used for word sense disambiguation (WSD) using WordNet, followed by sentiment identification using SentiWordNet. Our experimental results indicate that adjectives and adverbs are sufficient to infer the sentiment of tweets compared to other combinations. Comparatively, the statistical approach records higher accuracy than the NLP approach by approximately 17%.

Download Full-text

Computational Treatment of Multiword Expressions

The Oxford Handbook of Computational Linguistics 2nd edition ◽

10.1093/oxfordhb/9780199573691.013.56 ◽

2018 ◽

Author(s):

Carlos Ramisch ◽

Aline Villavicencio

Keyword(s):

Natural Language ◽

Language Processing ◽

Word Sense Disambiguation ◽

Word Sense ◽

Language Generation ◽

Multiword Expressions ◽

Language Technology ◽

Sense Disambiguation ◽

Technology Applications ◽

Nominal Compounds

In natural-language processing, multiword expressions (MWEs) have been the focus of much attention in their many forms, including idioms, nominal compounds, verbal expressions, and collocations. In addition to their relevance for lexicographic and terminographic work, their ubiquity in language affects the performance of tasks like parsing, word sense disambiguation, and natural-language generation. They lend a mark of naturalness and fluency to applications that can deal with them, ranging from machine translation to information retrieval. This chapter presents an overview of their linguistic characteristics and discusses a variety of proposals for incorporating them into language technology, covering type-based discovery, token-based identification, and MWE-aware language technology applications.

Download Full-text

Word Sense Disambiguation for Chinese Based on Semantics Calculation

Mathematical Problems in Engineering ◽

10.1155/2015/235096 ◽

2015 ◽

Vol 2015 ◽

pp. 1-6 ◽

Cited By ~ 1

Author(s):

Yuntong Liu ◽

Hua Sun

Keyword(s):

Natural Language Processing ◽

Computational Complexity ◽

Natural Language ◽

Language Processing ◽

Word Sense Disambiguation ◽

Word Sense ◽

Semantic Model ◽

Sense Disambiguation

In order to use semantics more effectively in natural language processing, a word sense disambiguation method for Chinese based on semantics calculation was proposed. The word sense disambiguation for a Chinese clause could be achieved by solving the semantic model of the natural language; each step of the word sense disambiguation process was discussed in detail; and the computational complexity of the word sense disambiguation process was analyzed. Finally, some experiments were finished to verify the effectiveness of the method.

Download Full-text

A Multi-Strategy Word Sense Disambiguation Method

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.182-183.2109 ◽

2012 ◽

Vol 182-183 ◽

pp. 2109-2112

Author(s):

Lin Lin Yu ◽

Deng Feng Xu ◽

Li Fang Song ◽

Guo Jie Li ◽

Xu Dong Song

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Word Sense Disambiguation ◽

Word Sense ◽

Large Area ◽

Research Areas ◽

Sense Disambiguation ◽

Difficult Issue

Word sense disambiguation (WSD) is a critical and difficult issue in natural language processing(NLP), as well as WSD is great significance in large area of research areas of NLP. This paper presents a method of multi-word sense disambiguation strategy. The method combines the method based on match word corpus and the method based on the similarity and relevance very well. While the calculation of similarity and relevance are make full use of the sememe-tree information from HowNet. The experiments show that the proposed WSD method can obtain better results.

Download Full-text