DISTRIBUTIONAL ANALYSIS OF RELATED SYNSETS IN WordNet FOR A WORD SENSE DISAMBIGUATION TASK

2005 ◽  
Vol 14 (06) ◽  
pp. 919-934 ◽  
Author(s):  
KOSTAS FRAGOS ◽  
YANIS MAISTROS

This work presents a new method for an unsupervised word sense disambiguation task using WordNet semantic relations. In this method we expand the context of a word being disambiguated with related synsets from the available WordNet relations and study within this set the distribution of the related synset that correspond to each sense of the target word. A single sample Pearson-Chi-Square goodness-of-fit hypothesis test is used to determine whether the null hypothesis of a composite normality PDF is a reasonable assumption for a set of related synsets corresponding to a sense. The calculated p-value from this test is a critical value for deciding the correct sense. The target word is assigned the sense, the related synsets of which are distributed more "abnormally" relative to the other sets of the other senses. Our algorithm is evaluated on English lexical sample data from the Senseval-2 word sense disambiguation competition. Three WordNet relations, antonymy, hyponymy and hypernymy give a distributional set of related synsets for the context that was proved quite a good word sense discriminator, achieving comparable results with the system obtained the better results among the other competing participants.

2013 ◽  
Vol 411-414 ◽  
pp. 287-290
Author(s):  
Nantapong Keandoungchun ◽  
Nithinant Thammakoranonta

This paper proposes a novel approach for word sense disambiguation (WSD) in English to Thai. The approach generated a knowledge base which stored information of local context and then applied this information to analyze probabilities of several meanings of a target word. The meanings with the maximum probability are translated as Thai meaning of that English target word. The approach has been evaluated by analyzing the percentage of accuracy of the target word translation in each paper. It also compared the accuracy with Google translation. The experimental results indicate that the proposed approach is more accuracy than Google Translation by using paired T-test statistic equals to 6.628 with sig. = 0.00 (< 0.05)


2013 ◽  
Vol 22 (02) ◽  
pp. 1350003 ◽  
Author(s):  
KOSTAS FRAGOS

In this work, we propose a new measure of semantic relatedness between concepts applied in word sense disambiguation. Using the overlaps between WordNet definitions of concepts (glosses) and the so-called goodness of fit statistical test we establish a formal mechanism for quantifying and estimating the semantic relatedness between concepts. More concretely, we model WordNet glosses overlaps by making a theoretical assumption about their distribution and then we quantify the discrepancy between the theoretical and actual distribution. This discrepancy is suitably used to measure the relatedness between the input concepts. The experimental results showed very good performance on SensEval-2 lexical sample data for word sense disambiguation.


2021 ◽  
Vol 11 (6) ◽  
pp. 2567
Author(s):  
Mohammed El-Razzaz ◽  
Mohamed Waleed Fakhr ◽  
Fahima A. Maghraby

Word Sense Disambiguation (WSD) aims to predict the correct sense of a word given its context. This problem is of extreme importance in Arabic, as written words can be highly ambiguous; 43% of diacritized words have multiple interpretations and the percentage increases to 72% for non-diacritized words. Nevertheless, most Arabic written text does not have diacritical marks. Gloss-based WSD methods measure the semantic similarity or the overlap between the context of a target word that needs to be disambiguated and the dictionary definition of that word (gloss of the word). Arabic gloss WSD suffers from a lack of context-gloss datasets. In this paper, we present an Arabic gloss-based WSD technique. We utilize the celebrated Bidirectional Encoder Representation from Transformers (BERT) to build two models that can efficiently perform Arabic WSD. These models can be trained with few training samples since they utilize BERT models that were pretrained on a large Arabic corpus. Our experimental results show that our models outperform two of the most recent gloss-based WSDs when we test them against the same test data used to evaluate our model. Additionally, our model achieves an F1-score of 89% compared to the best-reported F1-score of 85% for knowledge-based Arabic WSD. Another contribution of this paper is introducing a context-gloss benchmark that may help to overcome the lack of a standardized benchmark for Arabic gloss-based WSD.


Author(s):  
Oleg Kalinin

The article dwells on a modern cognitive and discourse study of metaphors. Taking the advantage of the analysis and fusion of information in foreign and domestic papers, the researcher delves into their classification from the ontological, axiological and epistemological points of view. The ontological level breaks down into two basic approaches, namely metaphorical nature of discourse and discursive nature of metaphors. The former analyses metaphors to fathom characteristics of discourse, while the other provides for the study of metaphorical features in the context of discursive communication. The axiological aspect covers critical and descriptive studies and the epistemological angle comprises quantitive and qualitative methods in metaphorical studies. Other issues covered in the paper incorporate a thorough review of methods for identification of metaphors to include computer-assisted solutions (Word Sense Disambiguation, Categorisation, Metaphor Clusters) and numerical analysis of the metaphorical nature of discourse – descriptor analysis, metaphor power index, cluster analysis, and complex metaphor power analysis. On the one hand, the conceptualization of research papers boils down to major features of the discursive approach to metaphors and on the other, multiple studies of metaphors in the context of discourse pave the way for a discursive trend in cognitive metaphorology.


Telugu (తెలుగు) is one of the Dravidian languages which are morphologically rich. As within the other languages, it too consists of ambiguous words/phrases which have one-of-a-kind meanings in special contexts. Such words are referred as polysemous words i.e. words having a couple of experiences. A Knowledge based approach is proposed for disambiguating Telugu polysemous phrases using the computational linguistics tool, IndoWordNet. The task of WSD (Word sense disambiguation) requires finding out the similarity among the target phrase and the nearby phrase. In this approach, the similarity is calculated either by means of locating out the range of similar phrases (intersection) between the glosses (definition) of the target and nearby words or by way of finding out the exact occurrence of the nearby phrase's sense in the hierarchy (hypernyms/hyponyms) of the target phrase's senses. The above parameters are changed by using the intersection use of not simplest the glosses but also by using which include the related words. Additionally, it is a third parameter 'distance' which measures the distance among the target and nearby phrases. The proposed method makes use of greater parameters for calculating similarity. It scores the senses based on the general impact of parameters i.e. intersection, hierarchy and distance, after which chooses the sense with the best score. The correct meaning of Telugu polysemous phrase could be identified with this technique.


Author(s):  
Sebastian Weigelt

Systems such as Alexa, Cortana, and Siri appear rather smart. However, they only react to predefined wordings and do not actually grasp the user’s intent. To overcome this limitation, a system must understand the topics the user is talking about. Therefore, we apply unsupervised multi-topic labeling to spoken utterances. Although topic labeling is a well-studied task on textual documents, its potential for spoken input is almost unexplored. Our approach for topic labeling is tailored to spoken utterances; it copes with short and ungrammatical input. The approach is two-tiered. First, we disambiguate word senses. We utilize Wikipedia as pre-labeled corpus to train a naïve-bayes classifier. Second, we build topic graphs based on DBpedia relations. We use two strategies to determine central terms in the graphs, i.e. the shared topics. One focuses on the dominant senses in the utterance and the other covers as many distinct senses as possible. Our approach creates multiple distinct topics per utterance and ranks results. The evaluation shows that the approach is feasible; the word sense disambiguation achieves a recall of 0.799. Concerning topic labeling, in a user study subjects assessed that in 90.9% of the cases at least one proposed topic label among the first four is a good fit. With regard to precision, the subjects judged that 77.2% of the top ranked labels are a good fit or good but somewhat too broad (Fleiss’ kappa κ = 0.27). We illustrate areas of application of topic labeling in the field of programming in spoken language. With topic labeling applied to the spoken input as well as ontologies that model the situational context we are able to select the most appropriate ontologies with an F1-score of 0.907.


Sign in / Sign up

Export Citation Format

Share Document