SEMANTIC IDENTIFICATION AND VISUALIZATION OF SIGNIFICANT WORDS WITHIN DOCUMENTS - Approach to Visualize Relevant Words within Documents to a Search Query by Word Similarity Computation

2013 ◽  
Vol 336-338 ◽  
pp. 2115-2118
Author(s):  
Pei Ying Zhang

Word similarity computation is broadly used in many applications, such as information retrieval, information extraction, text categorization, word sense disambiguation and example-based machine translation and so on. The main obstacle of word similarity computation lie in that how to develop a computational algorithm that is capable of generating satisfactory results close to how human perceive. This paper proposed an approach of word similarity computation which combined WordNet and HowNet. Experiments on Chinese word pairs show that our method is closest to human similarity judgments when compared to the major state-of-art methods.


2009 ◽  
Vol 29 (1) ◽  
pp. 217-220 ◽  
Author(s):  
Li LIN ◽  
Fang XUE ◽  
Zhong-sheng REN

2019 ◽  
Vol 5 (2) ◽  
pp. 164
Author(s):  
Herry Sujaini

We described the results of a study to determine the best features for algorithm EWSB (Extended Word Similarity Based). EWSB is a word clustering algorithm that can be used for all languages with a common feature. We provided four alternative features that can be used for word similarity computation and experimented toward the Indonesian Language to determine the best feature format for the language. We found that the best feature used in the algorithm to Indonesian EWSB is t w w' format (3-gram) with 0 (zero) word relation. Moreover, we found that using 3-gram is better than 4-gram for all the proposed features. Average recall of 3-gram is 83.50%, while the average 4-gram recall is 57.25%.


2012 ◽  
Vol 2012 ◽  
pp. 1-11 ◽  
Author(s):  
Peng Jin ◽  
John Carroll ◽  
Yunfang Wu ◽  
Diana McCarthy

Distributional Similarity has attracted considerable attention in the field of natural language processing as an automatic means of countering the ubiquitous problem of sparse data. As a logographic language, Chinese words consist of characters and each of them is composed of one or more radicals. The meanings of characters are usually highly related to the words which contain them. Likewise, radicals often make a predictable contribution to the meaning of a character: characters that have the same components tend to have similar or related meanings. In this paper, we utilize these properties of the Chinese language to improve Chinese word similarity computation. Given a content word, we first extract similar words based on a large corpus and a similarity score for ranking. This rank is then adjusted according to the characters and components shared between the similar word and the target word. Experiments on two gold standard datasets show that the adjusted rank is superior and closer to human judgments than the original rank. In addition to quantitative evaluation, we examine the reasons behind errors drawing on linguistic phenomena for our explanations.


Sign in / Sign up

Export Citation Format

Share Document