Text Classification with Topic-based Word Embedding and Convolutional Neural Networks

Author(s):  
Haotian Xu ◽  
Ming Dong ◽  
Dongxiao Zhu ◽  
Alexander Kotov ◽  
April Idalski Carcone ◽  
...  
Information ◽  
2021 ◽  
Vol 12 (2) ◽  
pp. 52
Author(s):  
Awet Fesseha ◽  
Shengwu Xiong ◽  
Eshete Derb Emiru ◽  
Moussa Diallo ◽  
Abdelghani Dahou

This article studies convolutional neural networks for Tigrinya (also referred to as Tigrigna), which is a family of Semitic languages spoken in Eritrea and northern Ethiopia. Tigrinya is a “low-resource” language and is notable in terms of the absence of comprehensive and free data. Furthermore, it is characterized as one of the most semantically and syntactically complex languages in the world, similar to other Semitic languages. To the best of our knowledge, no previous research has been conducted on the state-of-the-art embedding technique that is shown here. We investigate which word representation methods perform better in terms of learning for single-label text classification problems, which are common when dealing with morphologically rich and complex languages. Manually annotated datasets are used here, where one contains 30,000 Tigrinya news texts from various sources with six categories of “sport”, “agriculture”, “politics”, “religion”, “education”, and “health” and one unannotated corpus that contains more than six million words. In this paper, we explore pretrained word embedding architectures using various convolutional neural networks (CNNs) to predict class labels. We construct a CNN with a continuous bag-of-words (CBOW) method, a CNN with a skip-gram method, and CNNs with and without word2vec and FastText to evaluate Tigrinya news articles. We also compare the CNN results with traditional machine learning models and evaluate the results in terms of the accuracy, precision, recall, and F1 scoring techniques. The CBOW CNN with word2vec achieves the best accuracy with 93.41%, significantly improving the accuracy for Tigrinya news classification.


Author(s):  
Zhipeng Tan ◽  
Jing Chen ◽  
Qi Kang ◽  
MengChu Zhou ◽  
Abdullah Abusorrah ◽  
...  

2020 ◽  
Vol 386 ◽  
pp. 42-53 ◽  
Author(s):  
Jingyun Xu ◽  
Yi Cai ◽  
Xin Wu ◽  
Xue Lei ◽  
Qingbao Huang ◽  
...  

2020 ◽  
Vol 34 (10) ◽  
pp. 13967-13968
Author(s):  
Yuxiang Xie ◽  
Hua Xu ◽  
Congcong Yang ◽  
Kai Gao

The distant supervised (DS) method has improved the performance of relation classification (RC) by means of extending the dataset. However, DS also brings the problem of wrong labeling. Contrary to DS, the few-shot method relies on few supervised data to predict the unseen classes. In this paper, we use word embedding and position embedding to construct multi-channel vector representation and use the multi-channel convolutional method to extract features of sentences. Moreover, in order to alleviate few-shot learning to be sensitive to overfitting, we introduce adversarial learning for training a robust model. Experiments on the FewRel dataset show that our model achieves significant and consistent improvements on few-shot RC as compared with baselines.


2019 ◽  
Vol 9 (11) ◽  
pp. 2347 ◽  
Author(s):  
Hannah Kim ◽  
Young-Seob Jeong

As the number of textual data is exponentially increasing, it becomes more important to develop models to analyze the text data automatically. The texts may contain various labels such as gender, age, country, sentiment, and so forth. Using such labels may bring benefits to some industrial fields, so many studies of text classification have appeared. Recently, the Convolutional Neural Network (CNN) has been adopted for the task of text classification and has shown quite successful results. In this paper, we propose convolutional neural networks for the task of sentiment classification. Through experiments with three well-known datasets, we show that employing consecutive convolutional layers is effective for relatively longer texts, and our networks are better than other state-of-the-art deep learning models.


Sign in / Sign up

Export Citation Format

Share Document