Three-way decision with co-training for partially labeled data

2021 ◽  
Vol 544 ◽  
pp. 500-518 ◽  
Author(s):  
Can Gao ◽  
Jie Zhou ◽  
Duoqian Miao ◽  
Jiajun Wen ◽  
Xiaodong Yue
2018 ◽  
Vol 5 (2) ◽  
pp. 239-250 ◽  
Author(s):  
Keyu Liu ◽  
Eric C. C. Tsang ◽  
Jingjing Song ◽  
Hualong Yu ◽  
Xiangjian Chen ◽  
...  

Author(s):  
Lujun Zhao ◽  
Qi Zhang ◽  
Peng Wang ◽  
Xiaoyu Liu

Most existing Chinese word segmentation (CWS) methods are usually supervised. Hence, large-scale annotated domain-specific datasets are needed for training. In this paper, we seek to address the problem of CWS for the resource-poor domains that lack annotated data. A novel neural network model is proposed to incorporate unlabeled and partially-labeled data. To make use of unlabeled data, we combine a bidirectional LSTM segmentation model with two character-level language models using a gate mechanism. These language models can capture co-occurrence information. To make use of partially-labeled data, we modify the original cross entropy loss function of RNN. Experimental results demonstrate that the method performs well on CWS tasks in a series of domains.


Sign in / Sign up

Export Citation Format

Share Document