Chinese Word Similarity Computation based on Automatically Acquired Knowledge

Author(s):  
Hao Wu ◽  
Wenpeng Lu ◽  
Yuteng Zhang
Author(s):  
Fulian Yin ◽  
Yanyan Wang ◽  
Jianbo Liu ◽  
Marco Tosato

AbstractThe word similarity task is used to calculate the similarity of any pair of words, and is a basic technology of natural language processing (NLP). The existing method is based on word embedding, which fails to capture polysemy and is greatly influenced by the quality of the corpus. In this paper, we propose a multi-prototype Chinese word representation model (MP-CWR) for word similarity based on synonym knowledge base, including knowledge representation module and word similarity module. For the first module, we propose a dual attention to combine semantic information for jointly learning word knowledge representation. The MP-CWR model utilizes the synonyms as prior knowledge to supplement the relationship between words, which is helpful to solve the challenge of semantic expression due to insufficient data. As for the word similarity module, we propose a multi-prototype representation for each word. Then we calculate and fuse the conceptual similarity of two words to obtain the final result. Finally, we verify the effectiveness of our model on three public data sets with other baseline models. In addition, the experiments also prove the stability and scalability of our MP-CWR model under different corpora.


Author(s):  
Qinjuan Yang ◽  
Haoran Xie ◽  
Gary Cheng ◽  
Fu Lee Wang ◽  
Yanghui Rao

AbstractChinese word embeddings have recently garnered considerable attention. Chinese characters and their sub-character components, which contain rich semantic information, are incorporated to learn Chinese word embeddings. Chinese characters can represent a combination of meaning, structure, and pronunciation. However, existing embedding learning methods focus on the structure and meaning of Chinese characters. In this study, we aim to develop an embedding learning method that can make complete use of the information represented by Chinese characters, including phonology, morphology, and semantics. Specifically, we propose a pronunciation-enhanced Chinese word embedding learning method, where the pronunciations of context characters and target characters are simultaneously encoded into the embeddings. Evaluation of word similarity, word analogy reasoning, text classification, and sentiment analysis validate the effectiveness of our proposed method.


2013 ◽  
Vol 336-338 ◽  
pp. 2115-2118
Author(s):  
Pei Ying Zhang

Word similarity computation is broadly used in many applications, such as information retrieval, information extraction, text categorization, word sense disambiguation and example-based machine translation and so on. The main obstacle of word similarity computation lie in that how to develop a computational algorithm that is capable of generating satisfactory results close to how human perceive. This paper proposed an approach of word similarity computation which combined WordNet and HowNet. Experiments on Chinese word pairs show that our method is closest to human similarity judgments when compared to the major state-of-art methods.


Sign in / Sign up

Export Citation Format

Share Document