Scaling Conditional Random Field with Application to Chinese Word Segmentation

Third International Conference on Natural Computation (ICNC 2007) ◽

10.1109/icnc.2007.648 ◽

2007 ◽

Author(s):

Hai Zhao ◽

Chunyu Kit

Keyword(s):

Random Field ◽

Conditional Random Field ◽

Word Segmentation ◽

Chinese Word ◽

Chinese Word Segmentation

Download Full-text

Out-domain Chinese new word detection with statistics-based character embedding

Natural Language Engineering ◽

10.1017/s1351324918000463 ◽

2019 ◽

Vol 25 (2) ◽

pp. 239-255

Author(s):

Yuzhi Liang ◽

Min Yang ◽

Jia Zhu ◽

S. M. Yiu

Keyword(s):

Short Term Memory ◽

Conditional Random Field ◽

Word Segmentation ◽

Training Data ◽

Chinese Word ◽

Chinese Word Segmentation ◽

High Quality ◽

Pos Tagging ◽

Part Of Speech ◽

AbstractUnlike English and other Western languages, many Asian languages such as Chinese and Japanese do not delimit words by space. Word segmentation and new word detection are therefore key steps in processing these languages. Chinese word segmentation can be considered as a part-of-speech (POS)-tagging problem. We can segment corpus by assigning a label for each character which indicates the position of the character in a word (e.g., “B” for word beginning, and “E” for the end of the word, etc.). Chinese word segmentation seems to be well studied. Machine learning models such as conditional random field (CRF) and bi-directional long short-term memory (LSTM) have shown outstanding performances on this task. However, the segmentation accuracies drop significantly when applying the same approaches to out-domain cases, in which high-quality in-domain training data are not available. An example of out-domain applications is the new word detection in Chinese microblogs for which the availability of high-quality corpus is limited. In this paper, we focus on out-domain Chinese new word detection. We first design a new method Edge Likelihood (EL) for Chinese word boundary detection. Then we propose a domain-independent Chinese new word detector (DICND); each Chinese character is represented as a low-dimensional vector in the proposed framework, and segmentation-related features of the character are used as the values in the vector.

Download Full-text

Gated Recursive Neural Network for Chinese Word Segmentation

10.3115/v1/p15-1168 ◽

2015 ◽

Author(s):

Xinchi Chen ◽

Xipeng Qiu ◽

Chenxi Zhu ◽

Xuanjing Huang

Keyword(s):

Neural Network ◽

Word Segmentation ◽

Chinese Word ◽

Chinese Word Segmentation ◽

Recursive Neural Network

Download Full-text

A mixed approach for Chinese word segmentation

10.3115/v1/w14-6829 ◽

2014 ◽

Author(s):

Zhen Wang

Keyword(s):

Word Segmentation ◽

Chinese Word ◽

Chinese Word Segmentation ◽

Download Full-text

Revised DBLC Model for Chinese Word Segmentation

Proceedings of the 2019 International Conference on Artificial Intelligence and Computer Science - AICS 2019 ◽

10.1145/3349341.3349402 ◽

2019 ◽

Author(s):

Ziyu Liu ◽

Hehe Yang

Keyword(s):

Word Segmentation ◽

Chinese Word ◽

Chinese Word Segmentation

Download Full-text

Corpus Annotation System Based on HanLP Chinese Word Segmentation

The 2nd International Conference on Computing and Data Science ◽

10.1145/3448734.3450845 ◽

2021 ◽

Author(s):

Xuanjun Liu ◽

Zheyu Zhu ◽

Tengyan Fu ◽

Jiaxuan Chen ◽

Ying Jiang

Keyword(s):

Word Segmentation ◽

Chinese Word ◽

Corpus Annotation ◽

Chinese Word Segmentation ◽

Annotation System

Download Full-text

Combining Multi-knowledge for Chinese Word Segmentation Disambiguation

Sixth International Conference on Intelligent Systems Design and Applications ◽

10.1109/isda.2006.124 ◽

2006 ◽

Author(s):

Ying Qin ◽

Suxiang Zhang ◽

Xiaojie Wang

Keyword(s):

Word Segmentation ◽

Chinese Word ◽

Chinese Word Segmentation

Download Full-text

Improved source-channel models for Chinese word segmentation

10.3115/1075096.1075131 ◽

2003 ◽

Author(s):

Jianfeng Gao ◽

Mu Li ◽

Chang-Ning Huang

Keyword(s):

Word Segmentation ◽

Chinese Word ◽

Chinese Word Segmentation ◽

Download Full-text

Attention Is All You Need for Chinese Word Segmentation

10.18653/v1/2020.emnlp-main.317 ◽

2020 ◽

Author(s):

Sufeng Duan ◽

Hai Zhao

Keyword(s):

Word Segmentation ◽

Chinese Word ◽

Chinese Word Segmentation

Download Full-text

RethinkCWS: Is Chinese Word Segmentation a Solved Task?

10.18653/v1/2020.emnlp-main.457 ◽

2020 ◽

Author(s):

Jinlan Fu ◽

Pengfei Liu ◽

Qi Zhang ◽

Xuanjing Huang

Keyword(s):

Word Segmentation ◽

Chinese Word ◽

Chinese Word Segmentation

Download Full-text

Synthetic Word Parsing Improves Chinese Word Segmentation

10.3115/v1/p15-2043 ◽

2015 ◽

Author(s):

Fei Cheng ◽

Kevin Duh ◽

Yuji Matsumoto

Keyword(s):

Word Segmentation ◽

Chinese Word ◽

Chinese Word Segmentation

Download Full-text