Overview of the NLPCC-ICCPOL 2016 Shared Task: Chinese Word Similarity Measurement

Previously, researchers paid no attention to the creation of unambiguous morpheme embeddings independent from the corpus, while such information plays an important role in expressing the exact meanings of words for parataxis languages like Chinese. In this paper, after constructing the Chinese lexical and semantic ontology based on word-formation, we propose a novel approach to implanting the structured rational knowledge into distributed representation at morpheme level, naturally avoiding heavy disambiguation in the corpus. We design a template to create the instances as pseudo-sentences merely from the pieces of knowledge of morphemes built in the lexicon. To exploit hierarchical information and tackle the data sparseness problem, the instance proliferation technique is applied based on similarity to expand the collection of pseudo-sentences. The distributed representation for morphemes can then be trained on these pseudo-sentences using word2vec. For evaluation, we validate the paradigmatic and syntagmatic relations of morpheme embeddings, and apply the obtained embeddings to word similarity measurement, achieving significant improvements over the classical models by more than 5 Spearman scores or 8 percentage points, which shows very promising prospects for adoption of the new source of knowledge.

Download Full-text

Modeling multi-prototype Chinese word representation learning for word similarity

Complex & Intelligent Systems ◽

10.1007/s40747-021-00482-y ◽

2021 ◽

Author(s):

Fulian Yin ◽

Yanyan Wang ◽

Jianbo Liu ◽

Marco Tosato

Keyword(s):

Knowledge Representation ◽

Language Processing ◽

Representation Learning ◽

Word Knowledge ◽

Data Sets ◽

Chinese Word ◽

Word Similarity ◽

Public Data ◽

Word Representation ◽

The Stability

AbstractThe word similarity task is used to calculate the similarity of any pair of words, and is a basic technology of natural language processing (NLP). The existing method is based on word embedding, which fails to capture polysemy and is greatly influenced by the quality of the corpus. In this paper, we propose a multi-prototype Chinese word representation model (MP-CWR) for word similarity based on synonym knowledge base, including knowledge representation module and word similarity module. For the first module, we propose a dual attention to combine semantic information for jointly learning word knowledge representation. The MP-CWR model utilizes the synonyms as prior knowledge to supplement the relationship between words, which is helpful to solve the challenge of semantic expression due to insufficient data. As for the word similarity module, we propose a multi-prototype representation for each word. Then we calculate and fuse the conceptual similarity of two words to obtain the final result. Finally, we verify the effectiveness of our model on three public data sets with other baseline models. In addition, the experiments also prove the stability and scalability of our MP-CWR model under different corpora.

Download Full-text

Similarity measurement using term negative weight and its application to word similarity

Information Processing & Management ◽

10.1016/s0306-4573(00)00009-1 ◽

2000 ◽

Vol 36 (5) ◽

pp. 717-736 ◽

Cited By ~ 7

Author(s):

El-Sayed Atlam ◽

Masao Fuketa ◽

Kazuhiro Morita ◽

Jun-ichi Aoe

Keyword(s):

Similarity Measurement ◽

Negative Weight ◽

Word Similarity

Download Full-text

Pronunciation-Enhanced Chinese Word Embedding

Cognitive Computation ◽

10.1007/s12559-021-09850-9 ◽

2021 ◽

Author(s):

Qinjuan Yang ◽

Haoran Xie ◽

Gary Cheng ◽

Fu Lee Wang ◽

Yanghui Rao

Keyword(s):

Sentiment Analysis ◽

Text Classification ◽

Semantic Information ◽

Word Embedding ◽

Chinese Characters ◽

Learning Method ◽

Word Embeddings ◽

Chinese Word ◽

Word Similarity ◽

Meaning Structure

AbstractChinese word embeddings have recently garnered considerable attention. Chinese characters and their sub-character components, which contain rich semantic information, are incorporated to learn Chinese word embeddings. Chinese characters can represent a combination of meaning, structure, and pronunciation. However, existing embedding learning methods focus on the structure and meaning of Chinese characters. In this study, we aim to develop an embedding learning method that can make complete use of the information represented by Chinese characters, including phonology, morphology, and semantics. Specifically, we propose a pronunciation-enhanced Chinese word embedding learning method, where the pronunciations of context characters and target characters are simultaneously encoded into the embeddings. Evaluation of word similarity, word analogy reasoning, text classification, and sentiment analysis validate the effectiveness of our proposed method.

Download Full-text

Overview of the NLPCC 2015 Shared Task: Chinese Word Segmentation and POS Tagging for Micro-blog Texts

Natural Language Processing and Chinese Computing - Lecture Notes in Computer Science ◽

10.1007/978-3-319-25207-0_50 ◽

2015 ◽

pp. 541-549 ◽

Cited By ~ 9

Author(s):

Xipeng Qiu ◽

Peng Qian ◽

Liusong Yin ◽

Shiyu Wu ◽

Xuanjing Huang

Keyword(s):

Word Segmentation ◽

Chinese Word ◽

Shared Task ◽

Chinese Word Segmentation ◽

Pos Tagging

Download Full-text

Overview of the NLPCC-ICCPOL 2016 Shared Task: Chinese Word Similarity Measurement

Incorporating Prior Knowledge into Word Embedding for Chinese Word Similarity Measurement

Combining Word Embedding and Semantic Lexicon for Chinese Word Similarity Computation

Chinese Word Similarity Computation based on Automatically Acquired Knowledge

Research on HowNet-based Chinese Word Lexical Semantic Similarity Measurement

A hybrid approach for chinese word similarity computing based on HowNet

Implanting Rational Knowledge into Distributed Representation at Morpheme Level

Modeling multi-prototype Chinese word representation learning for word similarity

Similarity measurement using term negative weight and its application to word similarity

Pronunciation-Enhanced Chinese Word Embedding

Overview of the NLPCC 2015 Shared Task: Chinese Word Segmentation and POS Tagging for Micro-blog Texts

Export Citation Format