Improving of Imbalanced Data in Multiclass Classification for Sentiment Analysis using Supervised Term Weighting

Multiclass classification in cancer diagnostics, using DNA or Gene Expression Signatures, but also classification of bacteria species fingerprints in MALDI-TOF mass spectrometry data, is challenging because of imbalanced data and the high number of dimensions with respect to the number of instances. In this study, a new oversampling technique called LICIC will be presented as a valuable instrument in countering both class imbalance, and the famous “curse of dimensionality” problem. The method enables preservation of non-linearities within the dataset, while creating new instances without adding noise. The method will be compared with other oversampling methods, such as Random Oversampling, SMOTE, Borderline-SMOTE, and ADASYN. F1 scores show the validity of this new technique when used with imbalanced, multiclass, and high-dimensional datasets.

Download Full-text

Supervised term weighting for sentiment analysis

Proceedings of 2011 IEEE International Conference on Intelligence and Security Informatics ◽

10.1109/isi.2011.5984056 ◽

2011 ◽

Cited By ~ 4

Author(s):

Tam T. Nguyen ◽

Kuiyu Chang ◽

Siu Cheung Hui

Keyword(s):

Sentiment Analysis ◽

Term Weighting

Download Full-text

Comparison Study Of Term Weighting Optimally With SVM In Sentiment Analysis

Proceedings of the Proceedings of The 2nd International Conference On Advance And Scientific Innovation, ICASI 2019, 18 July, Banda Aceh, Indonesia ◽

10.4108/eai.18-7-2019.2288508 ◽

2019 ◽

Author(s):

Amril Siregar ◽

Sutan Faisal ◽

Tukino Tukino ◽

Adam Puspabhuana ◽

Manase Simarangkir

Keyword(s):

Sentiment Analysis ◽

Comparison Study ◽

Term Weighting

Download Full-text

Ensemble of Classifiers and Term Weighting Schemes for Sentiment Analysis in Turkish

10.52460/src.2021.004 ◽

2021 ◽

Vol 1 (1) ◽

pp. 1-12

Author(s):

Aytuğ Onan ◽

Keyword(s):

Sentiment Analysis ◽

Language Processing ◽

Nearest Neighbor ◽

Text Messages ◽

Support Vector ◽

K Nearest Neighbor ◽

Term Weighting ◽

Text Documents ◽

Weighting Schemes ◽

Short Text

With the advancement of information and communication technology, social networking and microblogging sites have become a vital source of information. Individuals can express their opinions, grievances, feelings, and attitudes about a variety of topics. Through microblogging platforms, they can express their opinions on current events and products. Sentiment analysis is a significant area of research in natural language processing because it aims to define the orientation of the sentiment contained in source materials. Twitter is one of the most popular microblogging sites on the internet, with millions of users daily publishing over one hundred million text messages (referred to as tweets). Choosing an appropriate term representation scheme for short text messages is critical. Term weighting schemes are critical representation schemes for text documents in the vector space model. We present a comprehensive analysis of Turkish sentiment analysis using nine supervised and unsupervised term weighting schemes in this paper. The predictive efficiency of term weighting schemes is investigated using four supervised learning algorithms (Naive Bayes, support vector machines, the k-nearest neighbor algorithm, and logistic regression) and three ensemble learning methods (AdaBoost, Bagging, and Random Subspace). The empirical evidence suggests that supervised term weighting models can outperform unsupervised term weighting models.

Download Full-text

Aspect Category Classification dengan Pendekatan Machine Learning Menggunakan Dataset Bahasa Indonesia

Jurnal Nasional Teknik Elektro dan Teknologi Informasi (JNTETI) ◽

10.22146/jnteti.v10i3.1819 ◽

2021 ◽

Vol 10 (3) ◽

pp. 229-235

Author(s):

Syaifulloh Amien Pandega Perdana ◽

Teguh Bharata Aji ◽

Ridi Ferdiana

Keyword(s):

Machine Learning ◽

Support Vector Machine ◽

Random Forest ◽

Sentiment Analysis ◽

Support Vector ◽

Term Weighting ◽

Inverse Document Frequency ◽

Term Frequency ◽

Document Frequency ◽

Bahasa Indonesia

Ulasan pelanggan merupakan opini terhadap kualitas barang atau jasa yang dirasakan konsumen. Ulasan pelanggan mengandung informasi yang berguna bagi konsumen maupun penyedia barang atau jasa. Ketersediaan ulasan pelanggan dalam jumlah besar pada website membutuhkan suatu framework untuk mengekstraksi sentimen secara otomatis. Sebuah ulasan pelanggan sering kali mengandung banyak aspek sehingga Aspect Based Sentiment Analysis (ABSA) harus digunakan untuk mengetahui polaritas masing-masing aspek. Salah satu tugas penting dalam ABSA adalah Aspect Category Detection. Metode machine learning untuk Aspect Category Detection sudah banyak dilakukan pada domain berbahasa Inggris, tetapi pada domain bahasa Indonesia masih sedikit. Makalah ini membandingkan kinerja tiga algoritme machine learning, yaitu Naïve Bayes (NB), Support Vector Machine (SVM), dan Random Forest (RF) pada ulasan pelanggan berbahasa Indonesia menggunakan Term Frequency–Inverse Document Frequency (TF-IDF) sebagai term weighting. Hasil menunjukkan bahwa RF memiliki kinerja paling unggul dibandingkan NB dan SVM pada tiga domain yang berbeda, yaitu restoran, hotel, dan e-commerce, dengan nilai f1-score untuk masing-masing domain adalah 84.3%, 85.7%, dan 89,3%.

Download Full-text

Ensemble-support vector machine-random undersampling: Simulation study of multiclass classification for handling high dimensional and imbalanced data

Journal of Physics Conference Series ◽

10.1088/1742-6596/1613/1/012064 ◽

2020 ◽

Vol 1613 ◽

pp. 012064

Author(s):

Nur Silviyah Rahmi

Keyword(s):

Support Vector Machine ◽

Simulation Study ◽

Imbalanced Data ◽

Multiclass Classification ◽

High Dimensional ◽

Support Vector ◽

Random Undersampling

Download Full-text

Normalization of Term Weighting Scheme for Sentiment Analysis

Human Language Technology Challenges for Computer Science and Linguistics - Lecture Notes in Computer Science ◽

10.1007/978-3-319-14120-6_10 ◽

2014 ◽

pp. 116-128 ◽

Cited By ~ 1

Author(s):

Alexander Pak ◽

Patrick Paroubek ◽

Amel Fraisse ◽

Gil Francopoulo

Keyword(s):

Sentiment Analysis ◽

Weighting Scheme ◽

Term Weighting

Download Full-text

Performance Analysis of Multiple Classifiers using different Term Weighting Schemes for Sentiment Analysis

2019 International Conference on Intelligent Computing and Control Systems (ICCS) ◽

10.1109/iccs45141.2019.9065895 ◽

2019 ◽

Author(s):

Aiman Abdullah Anees ◽

Harsh Prakash Gupta ◽

Aditya Prashant Dalvi ◽

Suhas Gopinath ◽

Biju R Mohan

Keyword(s):

Performance Analysis ◽

Sentiment Analysis ◽

Term Weighting ◽

Weighting Schemes ◽

Multiple Classifiers

Download Full-text

Deep Convolutional Arabic Sentiment Analysis with Imbalanced Data

2019 15th International Computer Engineering Conference (ICENCO) ◽

10.1109/icenco48310.2019.9027319 ◽

2019 ◽

Author(s):

Eslam Omara ◽

Mervat Mosa ◽

Nabil Ismail

Keyword(s):

Sentiment Analysis ◽

Imbalanced Data ◽

Arabic Sentiment Analysis

Download Full-text

Multiclass Sentiment Analysis of Social Media Data using Neural Networks

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.a1012.1291s319 ◽

2019 ◽

Vol 9 (1S3) ◽

pp. 57-62

Keyword(s):

Neural Network ◽

Sentiment Analysis ◽

Social Networking Sites ◽

Opinion Mining ◽

Binary Classification ◽

Multiclass Classification ◽

Machine Learning Algorithms ◽

Training Data ◽

Training Dataset ◽

Data Sets

Sentiment analysis, also known as Opinion Mining is one of the hottest topic Nowadays. in various social networking sites is one of the hottest topic and field nowadays. Here, we are using Twitter, the biggest web destinations for people to communicate with each other to perform the sentiment analysis and opinion mining by extracting the tweets by various users. The users can post brief text updates in twitter as it only allows 140 characters in one text message. Hashtags helps to search for tweets dealing with the specified subject. In previous researches, binary classification usually relies on the sentiment polarity(Positive , Negative and Neutral). The advantage is that multiple meaning of the same world might have different polarity, so it can be easily identified. In Multiclass classification, many tweets of one class are classified as if they belong to the others. The Neutral class presented the lowest precision in all the researches happened in this particular area. The set of tweets containing text and emoticon data will be classified into 13 classes. From each tweet, we extract different set of features using one hot encoding algorithm and use machine learning algorithms to perform classification. The entire tweets will be divided into training data sets and testing data sets. Training dataset will be pre-processed and classified using various Artificial Neural Network algorithms such as Reccurent Neural Network, Convolutional Neural Network etc. Moreover, the same procedure will be followed for the Text and Emoticon data. The developed model or system will be tested using the testing dataset. More precise and correct accuracy can be obtained or experienced using this multiclass classification of text and emoticons. 4 Key performance indicators will be used to evaluate the effectiveness of the corresponding approach.

Download Full-text