Co-occurrence graph-based context adaptation: a new unsupervised approach to word sense disambiguation

Ambiguous Word ◽

Experimental Results ◽

Word Sense ◽

Structure And Properties ◽

Similarity Functions ◽

Challenging Tasks ◽

Sense Disambiguation ◽

Unsupervised Approach

Abstract Word sense disambiguation (WSD) is the task of selecting correct sense for an ambiguous word in its context. Since WSD is one of the most challenging tasks in various text processing systems, improving its accuracy can be very beneficial. In this article, we propose a new unsupervised method based on co-occurrence graph created by monolingual corpus without any dependency on the structure and properties of the language itself. In the proposed method, the context of an ambiguous word is represented as a sub-graph extracted from a large word co-occurrence graph built based on a corpus. Most of the words are connected in this graph. To clarify the exact sense of an ambiguous word, its senses and relations are added to the context graph, and various similarity functions are employed based on the senses and context graph. In the disambiguation process, we select senses with highest similarity to the context graph. As opposite to other WSD methods, the proposed method does not use any language-dependent resources (e.g. WordNet) and it just uses a monolingual corpus. Therefore, the proposed method can be employed for other languages. Moreover, by increasing the size of corpus, it is possible to enhance the accuracy of WSD. Experimental results on English and Persian datasets show that the proposed method is competitive with existing supervised and unsupervised WSD approaches.

A Word Sense Disambiguation Method Based on Reconstruction of Context by Correlation

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.135-136.160 ◽

2011 ◽

Vol 135-136 ◽

pp. 160-166 ◽

Cited By ~ 1

Author(s):

Xin Hua Fan ◽

Bing Jun Zhang ◽

Dong Zhou

Keyword(s):

Information Entropy ◽

Average Distance ◽

Ambiguous Word ◽

Experimental Results ◽

Occurrence Frequency ◽

Word Sense ◽

Occurrence Data ◽

Ambiguous Words ◽

This paper presents a word sense disambiguation method by reconstructing the context using the correlation between words. Firstly, we figure out the relevance between words though the statistical quantity(co-occurrence frequency , the average distance and the information entropy) from the corpus. Secondly, we see the words that have lager correlation value between ambiguous word than other words in the context as the important words, and use this kind of words to reconstruct the context, then we use the reconstructed context as the new context of the ambiguous words .In the end, we use the method of the sememe co-occurrence data[10] for word sense disambiguation. The experimental results have proved the feasibility of this method.

Word sense disambiguation using implicit information

Natural Language Engineering ◽

10.1017/s1351324919000421 ◽

2019 ◽

Vol 26 (4) ◽

pp. 413-432 ◽

Cited By ~ 1

Author(s):

Goonjan Jain ◽

D.K. Lobiyal

Keyword(s):

Ambiguous Word ◽

Word Sense ◽

Implicit Information ◽

Ambiguous Words ◽

Sense Disambiguation ◽

Novel Method ◽

Unsupervised Approach ◽

Polysemous Words ◽

Better Than

AbstractHumans proficiently interpret the true sense of an ambiguous word by establishing association among words in a sentence. The complete sense of text is also based on implicit information, which is not explicitly mentioned. The absence of this implicit information is a significant problem for a computer program that attempts to determine the correct sense of ambiguous words. In this paper, we propose a novel method to uncover the implicit information that links the words of a sentence. We reveal this implicit information using a graph, which is then used to disambiguate the ambiguous word. The experiments show that the proposed algorithm interprets the correct sense for both homonyms and polysemous words. Our proposed algorithm has performed better than the approaches presented in the SemEval-2013 task for word sense disambiguation and has shown an accuracy of 79.6 percent, which is 2.5 percent better than the best unsupervised approach in SemEval-2007.

Proceedings of the First International Conference on Intelligent Human Computer Interaction ◽

An Unsupervised Approach to Hindi Word Sense Disambiguation

10.1007/978-81-8489-203-1_32 ◽

2009 ◽

pp. 327-335 ◽

Cited By ~ 11

Author(s):

Neetu Mishra ◽

Shashi Yadav ◽

Tanveer J. Siddiqui

Keyword(s):

Word Sense ◽

Sense Disambiguation ◽

Unsupervised Approach

2017 Second International Conference on Electrical, Computer and Communication Technologies (ICECCT) ◽

Word sense disambiguation in Bengali: An unsupervised approach

10.1109/icecct.2017.8117901 ◽

2017 ◽

Cited By ~ 2

Author(s):

Alok Ranjan Pal ◽

Diganta Saha

Keyword(s):

Word Sense ◽

Sense Disambiguation ◽

Unsupervised Approach

A Novel Approach to Word Sense Disambiguation Based on Topical and Semantic Association

The Scientific World JOURNAL ◽

10.1155/2013/586327 ◽

2013 ◽

Vol 2013 ◽

pp. 1-8 ◽

Cited By ~ 2

Author(s):

Xin Wang ◽

Wanli Zuo ◽

Ying Wang

Keyword(s):

Language Processing ◽

Fundamental Problem ◽

Ambiguous Word ◽

Semantic Features ◽

Word Sense ◽

Semantic Association ◽

Data Set ◽

Novel Approach ◽

Word sense disambiguation (WSD) is a fundamental problem in nature language processing, the objective of which is to identify the most proper sense for an ambiguous word in a given context. Although WSD has been researched over the years, the performance of existing algorithms in terms of accuracy and recall is still unsatisfactory. In this paper, we propose a novel approach to word sense disambiguation based on topical and semantic association. For a given document, supposing that its topic category is accurately discriminated, the correct sense of the ambiguous term is identified through the corresponding topic and semantic contexts. We firstly extract topic discriminative terms from document and construct topical graph based on topic span intervals to implement topic identification. We then exploit syntactic features, topic span features, and semantic features to disambiguate nouns and verbs in the context of ambiguous word. Finally, we conduct experiments on the standard data set SemCor to evaluate the performance of the proposed method, and the results indicate that our approach achieves relatively better performance than existing approaches.

Generating Sense Inventories for Ambiguous Arabic Words

The International Arab Journal of Information Technology ◽

10.34028/iajit/18/3a/8 ◽

2021 ◽

Author(s):

Marwah Alian ◽

Arafat Awajan

Keyword(s):

Word Sense ◽

Arabic Word ◽

Word Sense Induction ◽

Ambiguous Words ◽

Sense Disambiguation ◽

Sentence Similarity ◽

Unsupervised Approach ◽

Similar Accuracy

The process of selecting the appropriate meaning of an ambigous word according to its context is known as word sense disambiguation. In this research, we generate a number of Arabic sense inventories based on an unsupervised approach and different pre-trained embeddings, such as Aravec, Fast text, and Arabic-News embeddings. The resulted inventories from the pre-trained embeddings are evaluated to investigate their efficiency in Arabic word sense disambiguation and sentence similarity. The sense inventories are generated using an unsupervised approach that is based on a graph-based word sense induction algorithm. Results show that the Aravec-Twitter inventory achieves the best accuracy of 0.47 for 50 neighbors and a close accuracy to the Fast text inventory for 200 neighbors while it provides similar accuracy to the Arabic-News inventory for 100neighbors. The experiment of replacing ambiguous words with their sense vectors is tested for sentence similarity using all sense inventories and the results show that using Aravec-Twitter sense inventory provides a better correlation value

Detection and Analysis of Stress using Machine Learning Techniques

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.f8573.109119 ◽

2019 ◽

Vol 9 (1) ◽

pp. 335-342

Keyword(s):

Social Media ◽

Social Networking ◽

Sentiment Analysis ◽

Social Networking Sites ◽

Ambiguous Word ◽

Text Messages ◽

Word Sense ◽

Severe Problem ◽

Every year tens of millions of people suffer from depression and few of them get proper treatment on time. So, it is crucial to detect human stress and relaxation automatically via social media on a timely basis. It is very important to detect and manage stress before it goes into a severe problem. A huge number of informal messages are posted every day in social networking sites, blogs and discussion forums. This paper describes an approach to detect the stress using the information from social media networking sites, like tweeter.This paper presents a method to detect expressions of stress and relaxation on tweeter dataset i.e. working on sentiment analysis to find emotions or feelings about daily life. Sentiment analysis works the automatic extraction of sentiment related information from text. Here using TensiStrengthframework for sentiment strength detection on social networking sites to extract sentiment strength from the informal English text. TensiStrength is a system to detect the strength of stress and relaxation expressed in social media text messages. TensiStrength uses a lexical approach and a set of rules to detect direct and indirect expressions of stress or relaxation. This classifies both positive and negative emotions based on the strength scale from -5 to +5 indications of sentiments. Stressed sentences from the conversation are considered &categorised into stress and relax. TensiStrength is robust, it can be applied to a widevarietyofdifferent social web contexts. Theeffectiveness of TensiStrength depends on the nature of the tweets.In human being there is inborn capability to differentiate the multiple senses of an ambiguous word in a particular context, but machine executes only according to the instructions. The major drawback of machine translation is Word Sense Disambiguation. There is a fact that a single word can have multiple meanings or "senses." In the pre-processing partof-speech disambiguation is analysed and the drawback of WSD overcomes in the proposed method by unigram, bigram and trigram to give better result on ambiguous words. Here, SVM with Ngram gives better resultPrecision is65% and Recall is 67% .But, the main objective of this technique is to find the explicit and implicit amounts of stress and relaxation expressed in tweets. Keywords: Stress Detection, Data Mining, TensiStrength, word sense disambiguation.

Knowledge-Based Method for Word Sense Disambiguation by Using Hindi WordNet

Engineering, Technology & Applied Science Research ◽

10.48084/etasr.2596 ◽

2019 ◽

Vol 9 (2) ◽

pp. 3985-3989 ◽

Cited By ~ 1

Author(s):

P. Sharma ◽

N. Joshi

Keyword(s):

Natural Language ◽

Language Processing ◽

Speech Processing ◽

Text Processing ◽

Problem Area ◽

Word Sense ◽

Knowledge Resources ◽

Knowledge Based ◽

The purpose of word sense disambiguation (WSD) is to find the meaning of the word in any context with the help of a computer, to find the proper meaning of a lexeme in the available context in the problem area and the relationship between lexicons. This is done using natural language processing (NLP) techniques which involve queries from machine translation (MT), NLP specific documents or output text. MT automatically translates text from one natural language into another. Several application areas for WSD involve information retrieval (IR), lexicography, MT, text processing, speech processing etc. Using this knowledge-based technique, we are investigating Hindi WSD in this article. It involves incorporating word knowledge from external knowledge resources to remove the equivocalness of words. In this experiment, we tried to develop a WSD tool by considering a knowledge-based approach with WordNet of Hindi. The tool uses the knowledge-based LESK algorithm for WSD for Hindi. Our proposed system gives an accuracy of about 71.4%.

Capsule Network Improved Multi-Head Attention for Word Sense Disambiguation

Applied Sciences ◽

10.3390/app11062488 ◽

2021 ◽

Vol 11 (6) ◽

pp. 2488

Author(s):

Jinfeng Cheng ◽

Weiqin Tong ◽

Weian Yan

Keyword(s):

Neural Networks ◽

Language Processing ◽

Semantic Representation ◽

Ambiguous Word ◽

Word Sense ◽

Great Contribution ◽

Specific Context ◽

The Core ◽

Word sense disambiguation (WSD) is one of the core problems in natural language processing (NLP), which is to map an ambiguous word to its correct meaning in a specific context. There has been a lively interest in incorporating sense definition (gloss) into neural networks in recent studies, which makes great contribution to improving the performance of WSD. However, disambiguating polysemes of rare senses is still hard. In this paper, while taking gloss into consideration, we further improve the performance of the WSD system from the perspective of semantic representation. We encode the context and sense glosses of the target polysemy independently using encoders with the same structure. To obtain a better presentation in each encoder, we leverage the capsule network to capture different important information contained in multi-head attention. We finally choose the gloss representation closest to the context representation of the target word as its correct sense. We do experiments on English all-words WSD task. Experimental results show that our method achieves good performance, especially having an inspiring effect on disambiguating words of rare senses.

GENERATING TRAINING DATA FOR WORD SENSE DISAMBIGUATION IN RUSSIAN

Computational Linguistics and Intellectual Technologies ◽

10.28995/2075-7182-2020-19-119-132 ◽

2020 ◽

Author(s):

A. S. Bolshina ◽

◽

N. V. Loukachevitch ◽

Keyword(s):

Nearest Neighbor ◽

Ambiguous Word ◽

Russian Language ◽

Training Data ◽

Word Sense ◽

Training Samples ◽

Sense Disambiguation ◽

News Corpus ◽

Neighbor Classification

The best approaches in Word Sense Disambiguation (WSD) are supervised and rely on large amounts of hand-labelled data, which is not always available and costly to create. For the Russian language there is no sense-tagged resource of the size sufficient to train supervised word sense disambiguation algorithms. In our work we describe an approach that is used to create an automatically labelled collection based on the monosemous relatives (related unambiguous entries). The main contribution of our work is that we extracted monosemous relatives that can be located at relatively long distances from a target ambiguous word and ranked them according to the similarity measure to the target sense. The selected candidates are then used to extract training samples from the news corpus. We evaluated word sense disambiguation models based on a nearest neighbor classification on BERT and ELMo embeddings. Our work relies on the Russian wordnet RuWordNet.