A Comparative Study on Word Embeddings in Deep Learning for Text Classification

Sentiment analysis is one of the new absorbing parts appeared in natural language processing with the emergence of community sites on the web. Taking advantage of the amount of information now available, research and industry have been seeking ways to automatically analyze the sentiments expressed in texts. The challenge for this task is the human language ambiguity, and also the lack of labeled data. In order to solve this issue, sentiment analysis and deep learning have been merged as deep learning models are effective due to their automatic learning capability. In this paper, we provide a comparative study on IMDB movie review dataset, we compare word embeddings and further deep learning models on sentiment analysis and give broad empirical outcomes for those keen on taking advantage of deep learning for sentiment analysis in real-world settings.

Download Full-text

A comparative study of automated legal text classification using random forests and deep learning

Information Processing & Management ◽

10.1016/j.ipm.2021.102798 ◽

2022 ◽

Vol 59 (2) ◽

pp. 102798

Author(s):

Haihua Chen ◽

Lei Wu ◽

Jiangping Chen ◽

Wei Lu ◽

Junhua Ding

Keyword(s):

Deep Learning ◽

Comparative Study ◽

Random Forests ◽

Text Classification ◽

Legal Text

Download Full-text

A comparative study on various preprocessing techniques and deep learning algorithms for text classification

International Journal of Cloud Computing ◽

10.1504/ijcc.2022.10031639 ◽

2022 ◽

Vol 11 (1) ◽

pp. 1

Author(s):

NagarajaRao A ◽

Bhuvaneshwari Petchimuthu

Keyword(s):

Deep Learning ◽

Comparative Study ◽

Text Classification ◽

Learning Algorithms

Download Full-text

Comparative Study between Traditional Machine Learning and Deep Learning Approaches for Text Classification

Proceedings of the ACM Symposium on Document Engineering 2018 - DocEng '18 ◽

10.1145/3209280.3209526 ◽

2018 ◽

Cited By ~ 5

Author(s):

Cannannore Nidhi Kamath ◽

Syed Saqib Bukhari ◽

Andreas Dengel

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Comparative Study ◽

Text Classification ◽

Learning Approaches

Download Full-text

Deep Learning Methods with Pre-Trained Word Embeddings and Pre-Trained Transformers for Extreme Multi-Label Text Classification

10.1109/ubmk52708.2021.9558977 ◽

2021 ◽

Author(s):

Necdet Eren Erciyes ◽

Abdul Kadir Gorur

Keyword(s):

Deep Learning ◽

Text Classification ◽

Word Embeddings ◽

Learning Methods

Download Full-text

Deep Learning- and Word Embedding-Based Heterogeneous Classifier Ensembles for Text Classification

Complexity ◽

10.1155/2018/7130146 ◽

2018 ◽

Vol 2018 ◽

pp. 1-10 ◽

Cited By ~ 14

Author(s):

Zeynep H. Kilimci ◽

Selim Akyokus

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Ensemble Learning ◽

Text Classification ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Support Vector ◽

Word Embeddings ◽

Classifier Ensembles ◽

Document Representations

The use of ensemble learning, deep learning, and effective document representation methods is currently some of the most common trends to improve the overall accuracy of a text classification/categorization system. Ensemble learning is an approach to raise the overall accuracy of a classification system by utilizing multiple classifiers. Deep learning-based methods provide better results in many applications when compared with the other conventional machine learning algorithms. Word embeddings enable representation of words learned from a corpus as vectors that provide a mapping of words with similar meaning to have similar representation. In this study, we use different document representations with the benefit of word embeddings and an ensemble of base classifiers for text classification. The ensemble of base classifiers includes traditional machine learning algorithms such as naïve Bayes, support vector machine, and random forest and a deep learning-based conventional network classifier. We analysed the classification accuracy of different document representations by employing an ensemble of classifiers on eight different datasets. Experimental results demonstrate that the usage of heterogeneous ensembles together with deep learning methods and word embeddings enhances the classification performance of texts.

Download Full-text

Deep Learning for text in limted data settings

10.36227/techrxiv.12100692 ◽

2020 ◽

Author(s):

Pathikkumar Patel ◽

Bhargav Lad ◽

Jinan Fiaidhi

Keyword(s):

Machine Learning ◽

Time Series ◽

Deep Learning ◽

Sentiment Analysis ◽

Transfer Learning ◽

Text Classification ◽

State Of The Art ◽

Time Series Forecasting ◽

Text Data ◽

Performance Levels

During the last few years, RNN models have been extensively used and they have proven to be better for sequence and text data. RNNs have achieved state-of-the-art performance levels in several applications such as text classification, sequence to sequence modelling and time series forecasting. In this article we will review different Machine Learning and Deep Learning based approaches for text data and look at the results obtained from these methods. This work also explores the use of transfer learning in NLP and how it affects the performance of models on a specific application of sentiment analysis.

Download Full-text

On the cost-effectiveness of neural and non-neural approaches and representations for text classification: A comprehensive comparative study

Information Processing & Management ◽

10.1016/j.ipm.2020.102481 ◽

2021 ◽

Vol 58 (3) ◽

pp. 102481

Author(s):

Washington Cunha ◽

Vítor Mangaravite ◽

Christian Gomes ◽

Sérgio Canuto ◽

Elaine Resende ◽

...

Keyword(s):

Cost Effectiveness ◽

Comparative Study ◽

Text Classification ◽

The Cost

Download Full-text

Investigating the impact of pre-processing techniques and pre-trained word embeddings in detecting Arabic health information on social media

Journal Of Big Data ◽

10.1186/s40537-021-00488-w ◽

2021 ◽

Vol 8 (1) ◽

Author(s):

Yahya Albalawi ◽

Jim Buckley ◽

Nikola S. Nikolov

Keyword(s):

Social Media ◽

Deep Learning ◽

Comprehensive Evaluation ◽

Classification Problem ◽

Data Sets ◽

Word Embeddings ◽

Data Set ◽

Lower Accuracy ◽

Health Related ◽

The Impact

AbstractThis paper presents a comprehensive evaluation of data pre-processing and word embedding techniques in the context of Arabic document classification in the domain of health-related communication on social media. We evaluate 26 text pre-processings applied to Arabic tweets within the process of training a classifier to identify health-related tweets. For this task we use the (traditional) machine learning classifiers KNN, SVM, Multinomial NB and Logistic Regression. Furthermore, we report experimental results with the deep learning architectures BLSTM and CNN for the same text classification problem. Since word embeddings are more typically used as the input layer in deep networks, in the deep learning experiments we evaluate several state-of-the-art pre-trained word embeddings with the same text pre-processing applied. To achieve these goals, we use two data sets: one for both training and testing, and another for testing the generality of our models only. Our results point to the conclusion that only four out of the 26 pre-processings improve the classification accuracy significantly. For the first data set of Arabic tweets, we found that Mazajak CBOW pre-trained word embeddings as the input to a BLSTM deep network led to the most accurate classifier with F1 score of 89.7%. For the second data set, Mazajak Skip-Gram pre-trained word embeddings as the input to BLSTM led to the most accurate model with F1 score of 75.2% and accuracy of 90.7% compared to F1 score of 90.8% achieved by Mazajak CBOW for the same architecture but with lower accuracy of 70.89%. Our results also show that the performance of the best of the traditional classifier we trained is comparable to the deep learning methods on the first dataset, but significantly worse on the second dataset.

Download Full-text

A Comparative Study on Word Embeddings in Deep Learning for Text Classification

Comparative Study on Telugu text Classification using Machine Learning and Deep Learning models

Comparative study of deep learning models for sentiment analysis

A comparative study of automated legal text classification using random forests and deep learning

A comparative study on various preprocessing techniques and deep learning algorithms for text classification

Comparative Study between Traditional Machine Learning and Deep Learning Approaches for Text Classification

Deep Learning Methods with Pre-Trained Word Embeddings and Pre-Trained Transformers for Extreme Multi-Label Text Classification

Deep Learning- and Word Embedding-Based Heterogeneous Classifier Ensembles for Text Classification

Deep Learning for text in limted data settings

On the cost-effectiveness of neural and non-neural approaches and representations for text classification: A comprehensive comparative study

Investigating the impact of pre-processing techniques and pre-trained word embeddings in detecting Arabic health information on social media

Export Citation Format