Deep Learning Methods with Pre-Trained Word Embeddings and Pre-Trained Transformers for Extreme Multi-Label Text Classification

The prevalence that people share their opinions on the products and services in their daily lives on the Internet has generated a large quantity of comment data, which contain great business value. As for comment sentences, they often contain several comment aspects and the sentiment on these aspects are different, which makes it meaningless to give an overall sentiment polarity of the sentence. In this paper, we introduce Attention-based Aspect-level Recurrent Convolutional Neural Network (AARCNN) to analyze the remarks at aspect-level. The model integrates attention mechanism and target information analysis, which enables the model to concentrate on the important parts of the sentence and to make full use of the target information. The model uses bidirectional LSTM (Bi-LSTM) to build the memory of the sentence, and then CNN is applied to extracting attention from memory to get the attentive sentence representation. The model uses aspect embedding to analyze the target information of the representation and finally the model outputs the sentiment polarity through a softmax layer. The model was tested on multi-language datasets, and demonstrated that it has better performance than conventional deep learning methods.

Download Full-text

Identifying Emotion on Indonesian Tweets using Convolutional Neural Networks

Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi) ◽

10.29207/resti.v5i3.3137 ◽

2021 ◽

Vol 5 (3) ◽

pp. 584-593

Author(s):

Naufal Hilmiaji ◽

Kemas Muslim Lhaksmana ◽

Mahendra Dwifebri Purbolaksono

Keyword(s):

Neural Network ◽

Neural Networks ◽

Performance Evaluation ◽

Deep Learning ◽

Convolutional Neural Network ◽

Convolutional Neural Networks ◽

Text Classification ◽

Classification Model ◽

Learning Methods ◽

Expected Performance

especially with the advancement of deep learning methods for text classification. Despite some effort to identify emotion on Indonesian tweets, its performance evaluation results have not achieved acceptable numbers. To solve this problem, this paper implements a classification model using a convolutional neural network (CNN), which has demonstrated expected performance in text classification. To easily compare with the previous research, this classification is performed on the same dataset, which consists of 4,403 tweets in Indonesian that were labeled using five different emotion classes: anger, fear, joy, love, and sadness. The performance evaluation results achieve the precision, recall, and F1-score at respectively 90.1%, 90.3%, and 90.2%, while the highest accuracy achieves 89.8%. These results outperform previous research that classifies the same classification on the same dataset.

Download Full-text

A Comprehensive study on Text Classification: Application of Convolutional Neural Networks and Deep Learning methods

International Journal of Scientific Research in Computer Science Engineering and Information Technology ◽

10.32628/cseit217695 ◽

2021 ◽

pp. 364-371

Author(s):

Tiyasa Chatterjee ◽

Asoke Nath

Keyword(s):

Neural Networks ◽

Deep Learning ◽

Convolutional Neural Networks ◽

Text Classification ◽

Learning Methods ◽

Comprehensive Study

Download Full-text

Multilabel Text Classification in News Articles Using Long-Term Memory with Word2Vec

Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi) ◽

10.29207/resti.v4i2.1655 ◽

2020 ◽

Vol 4 (2) ◽

pp. 276-285

Author(s):

Winda Kurnia Sari ◽

Dian Palupi Rini ◽

Reza Firsandaya Malik ◽

Iman Saladin B. Azhar

Keyword(s):

Neural Networks ◽

Deep Learning ◽

Text Classification ◽

Large Scale ◽

Short Term Memory ◽

Short Term ◽

Learning Methods ◽

Processing Variable ◽

Term Memory ◽

Long Short Term Memory

Multilabel text classification is a task of categorizing text into one or more categories. Like other machine learning, multilabel classification performance is limited to the small labeled data and leads to the difficulty of capturing semantic relationships. It requires a multilabel text classification technique that can group four labels from news articles. Deep Learning is a proposed method for solving problems in multilabel text classification techniques. Some of the deep learning methods used for text classification include Convolutional Neural Networks, Autoencoders, Deep Belief Networks, and Recurrent Neural Networks (RNN). RNN is one of the most popular architectures used in natural language processing (NLP) because the recurrent structure is appropriate for processing variable-length text. One of the deep learning methods proposed in this study is RNN with the application of the Long Short-Term Memory (LSTM) architecture. The models are trained based on trial and error experiments using LSTM and 300-dimensional words embedding features with Word2Vec. By tuning the parameters and comparing the eight proposed Long Short-Term Memory (LSTM) models with a large-scale dataset, to show that LSTM with features Word2Vec can achieve good performance in text classification. The results show that text classification using LSTM with Word2Vec obtain the highest accuracy is in the fifth model with 95.38, the average of precision, recall, and F1-score is 95. Also, LSTM with the Word2Vec feature gets graphic results that are close to good-fit on seventh and eighth models.

Download Full-text

Deep Learning methods for Subject Text Classification of Articles

Proceedings of the 2017 Federated Conference on Computer Science and Information Systems ◽

10.15439/2017f414 ◽

2017 ◽

Cited By ~ 2

Author(s):

Piotr Semberecki ◽

Henryk Maciejewski

Keyword(s):

Deep Learning ◽

Text Classification ◽

Learning Methods

Download Full-text

Deep Learning- and Word Embedding-Based Heterogeneous Classifier Ensembles for Text Classification

Complexity ◽

10.1155/2018/7130146 ◽

2018 ◽

Vol 2018 ◽

pp. 1-10 ◽

Cited By ~ 14

Author(s):

Zeynep H. Kilimci ◽

Selim Akyokus

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Ensemble Learning ◽

Text Classification ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Support Vector ◽

Word Embeddings ◽

Classifier Ensembles ◽

Document Representations

The use of ensemble learning, deep learning, and effective document representation methods is currently some of the most common trends to improve the overall accuracy of a text classification/categorization system. Ensemble learning is an approach to raise the overall accuracy of a classification system by utilizing multiple classifiers. Deep learning-based methods provide better results in many applications when compared with the other conventional machine learning algorithms. Word embeddings enable representation of words learned from a corpus as vectors that provide a mapping of words with similar meaning to have similar representation. In this study, we use different document representations with the benefit of word embeddings and an ensemble of base classifiers for text classification. The ensemble of base classifiers includes traditional machine learning algorithms such as naïve Bayes, support vector machine, and random forest and a deep learning-based conventional network classifier. We analysed the classification accuracy of different document representations by employing an ensemble of classifiers on eight different datasets. Experimental results demonstrate that the usage of heterogeneous ensembles together with deep learning methods and word embeddings enhances the classification performance of texts.

Download Full-text

Deep Learning for text in limted data settings

10.36227/techrxiv.12100692 ◽

2020 ◽

Author(s):

Pathikkumar Patel ◽

Bhargav Lad ◽

Jinan Fiaidhi

Keyword(s):

Machine Learning ◽

Time Series ◽

Deep Learning ◽

Sentiment Analysis ◽

Transfer Learning ◽

Text Classification ◽

State Of The Art ◽

Time Series Forecasting ◽

Text Data ◽

Performance Levels

During the last few years, RNN models have been extensively used and they have proven to be better for sequence and text data. RNNs have achieved state-of-the-art performance levels in several applications such as text classification, sequence to sequence modelling and time series forecasting. In this article we will review different Machine Learning and Deep Learning based approaches for text data and look at the results obtained from these methods. This work also explores the use of transfer learning in NLP and how it affects the performance of models on a specific application of sentiment analysis.

Download Full-text