scholarly journals A Robust Morpheme Sequence and Convolutional Neural Network-Based Uyghur and Kazakh Short Text Classification

Information ◽  
2019 ◽  
Vol 10 (12) ◽  
pp. 387 ◽  
Author(s):  
Sardar Parhat ◽  
Mijit Ablimit ◽  
Askar Hamdulla

In this paper, based on the multilingual morphological analyzer, we researched the similar low-resource languages, Uyghur and Kazakh, short text classification. Generally, the online linguistic resources of these languages are noisy. So a preprocessing is necessary and can significantly improve the accuracy. Uyghur and Kazakh are the languages with derivational morphology, in which words are coined by stems concatenated with suffixes. Usually, terms are used as the representation of text content while excluding functional parts as stop words in these languages. By extracting stems we can collect necessary terms and exclude stop words. Morpheme segmentation tool can split text into morphemes with 95% high reliability. After preparing both word- and morpheme-based training text corpora, we apply convolutional neural network (CNN) as a feature selection and text classification algorithm to perform text classification tasks. Experimental results show that the morpheme-based approach outperformed the word-based approach. Word embedding technique is frequently used in text representation both in the framework of neural networks and as a value expression, and can map language units into a sequential vector space based on context, and it is a natural way to extract and predict out-of-vocabulary (OOV) from context information. Multilingual morphological analysis has provided a convenient way for processing tasks of low resource languages like Uyghur and Kazakh.

Author(s):  
Jingyun Xu ◽  
Yi Cai

Some text classification methods don’t work well on short texts due to the data sparsity. What’s more, they don’t fully exploit context-relevant knowledge. In order to tackle these problems, we propose a neural network to incorporate context-relevant knowledge into a convolutional neural network for short text classification. Our model consists of two modules. The first module utilizes two layers to extract concept and context features respectively and then employs an attention layer to extract those context-relevant concepts. The second module utilizes a convolutional neural network to extract high-level features from the word and the contextrelevant concept features. The experimental results on three datasets show that our proposed model outperforms the stateof-the-art models.


2019 ◽  
Author(s):  
Haidong Zhang ◽  
Wancheng Ni ◽  
Meijing Zhao ◽  
Ziqi Lin

2016 ◽  
Vol 174 ◽  
pp. 806-814 ◽  
Author(s):  
Peng Wang ◽  
Bo Xu ◽  
Jiaming Xu ◽  
Guanhua Tian ◽  
Cheng-Lin Liu ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document