Enhancing LSTM-based Word Segmentation Using Unlabeled Data

Author(s):  
Bo Zheng ◽  
Wanxiang Che ◽  
Jiang Guo ◽  
Ting Liu

Author(s):  
Xiaobin Wang ◽  
Deng Cai ◽  
Linlin Li ◽  
Guangwei Xu ◽  
Hai Zhao ◽  
...  

By exploiting unlabeled data for further performance improvement for Chinese word segmentation, this work makes the first attempt at exploring adding unsupervised segmentation information into neural supervised segmenter. We survey various effective strategies, including extending the character embedding, augmenting the word score and applying multi-task learning, for leveraging unsupervised information derived from abundant unlabeled data. Experiments on standard data sets show that the explored strategies indeed improve the recall rate of out-of-vocabulary words and thus boost the segmentation accuracy. Moreover, the model enhanced by the proposed methods outperforms state-of-theart models in closed test and shows promising improvement trend when adopting three different strategies with the help of a large unlabeled data set. Our thorough empirical study eventually verifies the proposed approach outperforms the widelyused pre-training approach in terms of effectively making use of freely abundant unlabeled data.



Author(s):  
Yanna Zhang ◽  
Jinan Xu ◽  
Guoyi Miao ◽  
Yufeng Chen ◽  
Yujie Zhang




2007 ◽  
Author(s):  
Joseph D. W. S tephens ◽  
Mark A. Pitt


Author(s):  
Jyotsna Vaid ◽  
Hsin-Chin Chen ◽  
Francisco E. Martinez ◽  
Chaitra Rao
Keyword(s):  


2015 ◽  
Author(s):  
Xinchi Chen ◽  
Xipeng Qiu ◽  
Chenxi Zhu ◽  
Xuanjing Huang






Sign in / Sign up

Export Citation Format

Share Document