Performance comparison of TF-IDF and Word2Vec models for emotion text classification

Emotion is the human feeling when communicating with other humans or reaction to everyday events. Emotion classification is needed to recognize human emotions from text. This study compare the performance of the TF-IDF and Word2Vec models to represent features in the emotional text classification. We use the support vector machine (SVM) and Multinomial Naïve Bayes (MNB) methods for classification of emotional text on commuter line and transjakarta tweet data. The emotion classification in this study has two steps. The first step classifies data that contain emotion or no emotion. The second step classifies data that contain emotions into five types of emotions i.e. happy, angry, sad, scared, and surprised. This study used three scenarios, namely SVM with TF-IDF, SVM with Word2Vec, and MNB with TF-IDF. The SVM with TF-IDF method generate the highest accuracy compared to other methods in the first dan second steps classification, then followed by the MNB with TF-IDF, and the last is SVM with Word2Vec. Then, the evaluation using precision, recall, and F1-measure results that the SVM with TF-IDF provides the best overall method. This study shows TF-IDF modeling has better performance than Word2Vec modeling and this study improves classification performance results compared to previous studies.

Download Full-text

Sentiment polarity classification of tweets using a extended dictionary

INTELIGENCIA ARTIFICIAL ◽

10.4114/intartif.vol21iss62pp1-12 ◽

2018 ◽

Vol 21 (62) ◽

pp. 1

Author(s):

Jorge E. Camargo ◽

Vladimir Vargas-Calderon ◽

Nelson Vargas ◽

Liliana Calderón-Benavides

Keyword(s):

Support Vector Machine ◽

Classification Accuracy ◽

Classification Performance ◽

Semantic Relations ◽

Support Vector ◽

The Real ◽

Polarity Classification ◽

Real Academia ◽

Word Definitions

With the purpose of classifying text based on its sentiment polarity (positive or negative), we proposed an extension of a 68,000 tweets corpus through the inclusion of word definitions from a dictionary of the Real Academia Espa\~{n}ola de la Lengua (RAE). A set of 28,000 combinations of 6 Word2Vec and support vector machine parameters were considered in order to evaluate how positively would affect the inclusion of a RAE's dictionary definitions classification performance. We found that such a corpus extension significantly improve the classification accuracy. Therefore, we conclude that the inclusion of a RAE's dictionary increases the semantic relations learned by Word2Vec allowing a better classification accuracy.

Download Full-text

Classification of human emotions from electroencephalogram using support vector machine

2015 International Conference on Information Processing (ICIP) ◽

10.1109/infop.2015.7489416 ◽

2015 ◽

Cited By ~ 2

Author(s):

Anita Patil ◽

Ashish Panat ◽

Supriya Ambadas Ragade

Keyword(s):

Support Vector Machine ◽

Support Vector ◽

Human Emotions

Download Full-text

Comparison of Random Forest and Support Vector Machine for Indonesian Tweet Complaint Classification

International Journal of Scientific Research in Computer Science Engineering and Information Technology ◽

10.32628/cseit195628 ◽

2019 ◽

pp. 202-207 ◽

Cited By ~ 1

Author(s):

Desi Ramayanti

Keyword(s):

Support Vector Machine ◽

Random Forest ◽

Text Classification ◽

Research Area ◽

Computational Time ◽

Support Vector ◽

Svm Classifier ◽

Text Documents ◽

Case Organization

In digital business, the managerial commonly need to process text so that it can be used to support decision-making. The number of text documents contained ideas and opinions is progressing and challenging to understand one by one. Whereas if the data are processed and correctly rendered using machine learning, it can present a general overview of a particular case, organization, or object quickly. Numerous researches have been accomplished in this research area, nevertheless, most of the studies concentrated on English text classification. Every language has various techniques or methods to classify text depending on the characteristics of its grammar. The result of classification among languages may be different even though it used the same algorithm. Given the greatness of text classification, text classification algorithms that can be implemented is the support vector machine (SVM) and Random Forest (RF). Based on the background above, this research is aimed to find out the performance of support vector machine algorithm and random forest in classification of Indonesian text. 1. Result of SVM classifier with cross validation k-10 is derived the best accuracy with value 0.9648, however, it spends computational time as long as 40.118 second. Then, result of RF classifier with values, i.e. 'bootstrap': False, 'min_samples_leaf': 1, 'n_estimators': 10, 'min_samples_split': 3, 'criterion': 'entropy', 'max_features': 3, 'max_depth': None is achieved accuracy is 0.9561 and computational time 109.399 second.

Download Full-text

Text Classification of British English and American English Using Support Vector Machine

2019 7th International Conference on Information and Communication Technology (ICoICT) ◽

10.1109/icoict.2019.8835256 ◽

2019 ◽

Author(s):

Muhammad Romi Ario Utomo ◽

Yuliant Sibaroni

Keyword(s):

Support Vector Machine ◽

Text Classification ◽

American English ◽

Support Vector ◽

British English

Download Full-text

On the Performance of Variational Mode Decomposition-Based Radio Frequency Fingerprinting of Bluetooth Devices

Sensors ◽

10.3390/s20061704 ◽

2020 ◽

Vol 20 (6) ◽

pp. 1704 ◽

Cited By ~ 4

Author(s):

Alghannai Aghnaiya ◽

Yaser Dalveren ◽

Ali Kara

Keyword(s):

Radio Frequency ◽

Classification Performance ◽

Performance Comparison ◽

Support Vector ◽

Performance Bounds ◽

Variational Mode Decomposition ◽

Transient Signals ◽

Mode Decomposition ◽

Band Limited

Radio frequency fingerprinting (RFF) is one of the communication network’s security techniques based on the identification of the unique features of RF transient signals. However, extracting these features could be burdensome, due to the nonstationary nature of transient signals. This may then adversely affect the accuracy of the identification of devices. Recently, it has been shown that the use of variational mode decomposition (VMD) in extracting features from Bluetooth (BT) transient signals offers an efficient way to improve the classification accuracy. To do this, VMD has been used to decompose transient signals into a series of band-limited modes, and higher order statistical (HOS) features are extracted from reconstructed transient signals. In this study, the performance bounds of VMD in RFF implementation are scrutinized. Firstly, HOS features are extracted from the band-limited modes, and then from the reconstructed transient signals directly. Performance comparison due to both HOS feature sets is presented. Moreover, the lower SNR bound within which the VMD can achieve acceptable accuracy in the classification of BT devices is determined. The approach has been tested experimentally with BT devices by employing a Linear Support Vector Machine (LSVM) classifier. According to the classification results, a higher classification performance is achieved (~4% higher) at lower SNR levels (−5–5 dB) when HOS features are extracted from band-limited modes in the implementation of VMD in RFF of BT devices.

Download Full-text

Imbalanced learning: Improving classification of diabetic neuropathy from magnetic resonance imaging

PLoS ONE ◽

10.1371/journal.pone.0243907 ◽

2020 ◽

Vol 15 (12) ◽

pp. e0243907

Author(s):

Kevin Teh ◽

Paul Armitage ◽

Solomon Tesfaye ◽

Dinesh Selvarajah ◽

Iain D. Wilkinson

Keyword(s):

Magnetic Resonance Imaging ◽

Support Vector Machine ◽

Class Imbalance ◽

Nearest Neighbors ◽

Classification Performance ◽

Support Vector ◽

Imbalanced Learning ◽

Resonance Imaging ◽

K Nearest Neighbors

One of the fundamental challenges when dealing with medical imaging datasets is class imbalance. Class imbalance happens where an instance in the class of interest is relatively low, when compared to the rest of the data. This study aims to apply oversampling strategies in an attempt to balance the classes and improve classification performance. We evaluated four different classifiers from k-nearest neighbors (k-NN), support vector machine (SVM), multilayer perceptron (MLP) and decision trees (DT) with 73 oversampling strategies. In this work, we used imbalanced learning oversampling techniques to improve classification in datasets that are distinctively sparser and clustered. This work reports the best oversampling and classifier combinations and concludes that the usage of oversampling methods always outperforms no oversampling strategies hence improving the classification results.

Download Full-text

A Study on the Emotion Classification of the Speech Signal Using Support Vector Machine

The Journal of Korean Institute of Communications and Information Sciences ◽

10.7840/kics.2021.46.10.1741 ◽

2021 ◽

Vol 46 (10) ◽

pp. 1741-1749

Author(s):

Jeong-seok Yeom ◽

Kwang-Bock You ◽

Kyungnam Jang

Keyword(s):

Support Vector Machine ◽

Speech Signal ◽

Support Vector ◽

Emotion Classification

Download Full-text

Using latent Dirichlet allocation to improve text classification performance of support vector machine

2016 IEEE Congress on Evolutionary Computation (CEC) ◽

10.1109/cec.2016.7743935 ◽

2016 ◽

Cited By ~ 1

Author(s):

Yaw-Huei Chen ◽

Shu-Fong Li

Keyword(s):

Support Vector Machine ◽

Text Classification ◽

Latent Dirichlet Allocation ◽

Classification Performance ◽

Support Vector ◽

Dirichlet Allocation

Download Full-text

Corrigendum: Sentiment Analysis in the Sales Review of Indonesian Marketplace by Utilizing Support Vector Machine

Journal of Information Systems Engineering and Business Intelligence ◽

10.20473/jisebi.4.2.169 ◽

2018 ◽

Vol 4 (2) ◽

pp. 169

Author(s):

Anang Anggono Lutfi ◽

Adhistya Erna Permanasari ◽

Silmi Fauziati

Keyword(s):

Support Vector Machine ◽

Sentiment Analysis ◽

Text Classification ◽

Daily Life ◽

Support Vector

In the version of this article initially published, there were some errors in Section III, Methods and Section VI, Conclusions. In Preprocessing of Methods, there is a sentence “The informal words may be in the form of slang words or abbreviations that are often used in daily life like cp at (from “cepat” or fast), blum (from “belum” or not yet), and gak (from “tidak” or no).”. The correct sentence is “The informal words may be in the form of slang words or abbreviations that are often used in daily life like cpat (from “cepat” or fast), blum (from “belum” or not yet), and gak (from “tidak” or no).”. In Text Classification of Methods, there is a sentence “Where P(B|A) is the probability of B appearance when A is known? The value P(A|B) is the probability of an appearance if B is known. P(A) is the probability of an appearance, while P(B) is the probability of B appearance.”. The correct sentence is “Where P(B│A) is the probability of the appearance of B when A is known. The value of P(A|B) is the probability of the appearance of A if B is known. P(A) is the probability of the appearance of A, while P(B) is the probability of the appearance of B.”. In Conclusions, a sentence “The accuracy reaches 93.42%; using 25% features with highest TF-IDF” should be changed to “The accuracy reaches 93.65%; using 25% features with highest TF-IDF” based on the results in Fig.3. These errors have been corrected in the PDF versions of the article.

Download Full-text