Research on Chinese Word Segmentation Based on Conditional Random Fields

Author(s):  
Chao Fan ◽  
Yu Li
2014 ◽  
Vol 556-562 ◽  
pp. 4376-4379
Author(s):  
Kun Zhi Gui ◽  
Yong Ren ◽  
Zhao Meng Peng

Chinese word segmentation is a fundamental problem in natural language processing. CRFs (Conditional Random Fields, CRFs) is an undirected graph model. It can work well with a variety of features, full use of the text information. Thus, this article adopts CRFs based Chinese word segmentation. This paper first gives the definition of CRFs model, the model parameter learning methods and reasoning algorithms. Then, it introduces the word tagging system which is widely used in Chinese word segmentation. The Bakeoff 2005 corpora are used in Chinese word segmentation experiments, and we achieve an excellent result on both MSRA and PKU corpora. The F-Measures on both corpora are 0.964 and 0.943, while the ROOV Values are 0.705 and 0.765.


Sign in / Sign up

Export Citation Format

Share Document