Generic framework for multilingual short text categorization using convolutional neural network

Author(s):  
Liriam Enamoto ◽  
Li Weigang ◽  
Geraldo P. Rocha Filho
2015 ◽  
Author(s):  
Peng Wang ◽  
Jiaming Xu ◽  
Bo Xu ◽  
Chenglin Liu ◽  
Heng Zhang ◽  
...  

2018 ◽  
Vol 2018 ◽  
pp. 1-9 ◽  
Author(s):  
Siyuan Zhao ◽  
Zhiwei Xu ◽  
Limin Liu ◽  
Mengjie Guo ◽  
Jing Yun

Convolutional neural network (CNN) has revolutionized the field of natural language processing, which is considerably efficient at semantics analysis that underlies difficult natural language processing problems in a variety of domains. The deceptive opinion detection is an important application of the existing CNN models. The detection mechanism based on CNN models has better self-adaptability and can effectively identify all kinds of deceptive opinions. Online opinions are quite short, varying in their types and content. In order to effectively identify deceptive opinions, we need to comprehensively study the characteristics of deceptive opinions and explore novel characteristics besides the textual semantics and emotional polarity that have been widely used in text analysis. In this paper, we optimize the convolutional neural network model by embedding the word order characteristics in its convolution layer and pooling layer, which makes convolutional neural network more suitable for short text classification and deceptive opinions detection. The TensorFlow-based experiments demonstrate that the proposed detection mechanism achieves more accurate deceptive opinion detection results.


Author(s):  
Jingyun Xu ◽  
Yi Cai

Some text classification methods don’t work well on short texts due to the data sparsity. What’s more, they don’t fully exploit context-relevant knowledge. In order to tackle these problems, we propose a neural network to incorporate context-relevant knowledge into a convolutional neural network for short text classification. Our model consists of two modules. The first module utilizes two layers to extract concept and context features respectively and then employs an attention layer to extract those context-relevant concepts. The second module utilizes a convolutional neural network to extract high-level features from the word and the contextrelevant concept features. The experimental results on three datasets show that our proposed model outperforms the stateof-the-art models.


Information ◽  
2019 ◽  
Vol 10 (12) ◽  
pp. 387 ◽  
Author(s):  
Sardar Parhat ◽  
Mijit Ablimit ◽  
Askar Hamdulla

In this paper, based on the multilingual morphological analyzer, we researched the similar low-resource languages, Uyghur and Kazakh, short text classification. Generally, the online linguistic resources of these languages are noisy. So a preprocessing is necessary and can significantly improve the accuracy. Uyghur and Kazakh are the languages with derivational morphology, in which words are coined by stems concatenated with suffixes. Usually, terms are used as the representation of text content while excluding functional parts as stop words in these languages. By extracting stems we can collect necessary terms and exclude stop words. Morpheme segmentation tool can split text into morphemes with 95% high reliability. After preparing both word- and morpheme-based training text corpora, we apply convolutional neural network (CNN) as a feature selection and text classification algorithm to perform text classification tasks. Experimental results show that the morpheme-based approach outperformed the word-based approach. Word embedding technique is frequently used in text representation both in the framework of neural networks and as a value expression, and can map language units into a sequential vector space based on context, and it is a natural way to extract and predict out-of-vocabulary (OOV) from context information. Multilingual morphological analysis has provided a convenient way for processing tasks of low resource languages like Uyghur and Kazakh.


Sign in / Sign up

Export Citation Format

Share Document