Hybrid Ensemble Learning With Feature Selection for Sentiment Classification in Social Media

Sanur Sharma; Anurag Jain

doi:10.4018/ijirr.2020040103

Hybrid Ensemble Learning With Feature Selection for Sentiment Classification in Social Media

International Journal of Information Retrieval Research ◽

10.4018/ijirr.2020040103 ◽

2020 ◽

Vol 10 (2) ◽

pp. 40-58 ◽

Cited By ~ 2

Author(s):

Sanur Sharma ◽

Anurag Jain

Keyword(s):

Social Media ◽

Feature Selection ◽

Ensemble Learning ◽

Information Gain ◽

Empirical Evaluation ◽

Ensemble Classifier ◽

Sentiment Classification ◽

Ensemble Classifiers ◽

Chi Squared

This article presents a study on ensemble learning and an empirical evaluation of various ensemble classifiers and ensemble features for sentiment classification of social media data. The data was collected from Twitter in real-time using Twitter API and text pre-processing and ranking-based feature selection is applied to textual data. A framework for a hybrid ensemble learning model is presented where a combination of ensemble features (Information Gain and CHI-Squared) and ensemble classifier that includes Ada Boost with SMO-SVM and Logistic Regression has been implemented. The classification of Twitter data is performed where sentiment analysis is used as a feature. The proposed model has shown improvements as compared to the state-of-the-art methods with an accuracy of 88.2% with a low error rate.

Download Full-text

Classification of Cardiac Arrhythmia using improved Feature Selection methods and Ensemble Classifiers

Journal of Physics Conference Series ◽

10.1088/1742-6596/2161/1/012003 ◽

2022 ◽

Vol 2161 (1) ◽

pp. 012003

Author(s):

Rajat Jain ◽

Pranam R Betrabet ◽

B Ashwath Rao ◽

N V Subba Reddy

Keyword(s):

Feature Selection ◽

Heart Diseases ◽

Sudden Cardiac Arrest ◽

Principal Component ◽

Ensemble Classifier ◽

Ensemble Classifiers ◽

Life Threatening ◽

Electrocardiogram Ecg ◽

Feature Selection Techniques

Abstract Arrhythmia is one of the life-threatening heart diseases which is diagnosed and analyzed using electrocardiogram (ECG) recordings and other symptoms namely rapid heartbeat or chest-pounding, shortness of breath, near fainting spells, insufficient pumping of blood from the heart, etc along with sudden cardiac arrest. Arrhythmia records a hasty and aberrant ECG. In this implementation, the arrhythmia dataset is collected from the UCI machine learning repository and then classified the records into sixteen stated classes using multiclass classification. The large feature set of the dataset is reduced using improved feature selection techniques such as t-Distributed Stochastic Neighbor Embedding (TSNE), Principal Component Analysis (PCA), Uniform Manifold Approximation, and Projection (UMAP) and then an Ensemble Classifier is built to analyse the classification accuracy on arrhythmia dataset to conclude when and which approach gives optimal results.

Download Full-text

SENTIMENT CLASSIFICATION OF THE LOCAL VISITORS' SOCIAL MEDIA REVIEWS

Social Sciences Studies Journal ◽

10.26449/sssj.1032 ◽

2018 ◽

Vol 4 (26) ◽

pp. 5534-5538

Author(s):

Semra AKTAŞ POLAT

Keyword(s):

Social Media ◽

Sentiment Classification

Download Full-text

Analysis of Feature Selection and Ensemble Classifier Methods for Intrusion Detection

International Journal of Natural Computing Research ◽

10.4018/ijncr.2018010104 ◽

2018 ◽

Vol 7 (1) ◽

pp. 57-72

Author(s):

H.P. Vinutha ◽

Poornima Basavaraju

Keyword(s):

Feature Selection ◽

Intrusion Detection ◽

Detection Rate ◽

Information Gain ◽

False Positive Rate ◽

Ensemble Classifier ◽

Ensemble Classification ◽

Chi Square ◽

Traffic Pattern ◽

Data Mining Algorithms

Day by day network security is becoming more challenging task. Intrusion detection systems (IDSs) are one of the methods used to monitor the network activities. Data mining algorithms play a major role in the field of IDS. NSL-KDD'99 dataset is used to study the network traffic pattern which helps us to identify possible attacks takes place on the network. The dataset contains 41 attributes and one class attribute categorized as normal, DoS, Probe, R2L and U2R. In proposed methodology, it is necessary to reduce the false positive rate and improve the detection rate by reducing the dimensionality of the dataset, use of all 41 attributes in detection technology is not good practices. Four different feature selection methods like Chi-Square, SU, Gain Ratio and Information Gain feature are used to evaluate the attributes and unimportant features are removed to reduce the dimension of the data. Ensemble classification techniques like Boosting, Bagging, Stacking and Voting are used to observe the detection rate separately with three base algorithms called Decision stump, J48 and Random forest.

Download Full-text

Extensive Survey on Feature Extraction and Feature Selection Techniques for Sentiment Classification in Social Media

2019 10th International Conference on Computing, Communication and Networking Technologies (ICCCNT) ◽

10.1109/icccnt45670.2019.8944391 ◽

2019 ◽

Author(s):

S.Sathish Kumar ◽

Aruchamy Rajini

Keyword(s):

Social Media ◽

Feature Extraction ◽

Feature Selection ◽

Sentiment Classification ◽

Extensive Survey ◽

Feature Selection Techniques

Download Full-text

Sentiment Classification of Social Media Text Considering User Attributes

Natural Language Understanding and Intelligent Applications - Lecture Notes in Computer Science ◽

10.1007/978-3-319-50496-4_52 ◽

2016 ◽

pp. 583-594 ◽

Cited By ~ 3

Author(s):

Junjie Li ◽

Haitong Yang ◽

Chengqing Zong

Keyword(s):

Social Media ◽

Sentiment Classification ◽

Social Media Text

Download Full-text

An Efficient Ensemble Learning Method for Gene Microarray Classification

BioMed Research International ◽

10.1155/2013/478410 ◽

2013 ◽

Vol 2013 ◽

pp. 1-10 ◽

Cited By ~ 9

Author(s):

Alireza Osareh ◽

Bita Shadgar

Keyword(s):

Feature Selection ◽

Ensemble Learning ◽

Feature Selection Method ◽

Support Vector ◽

Gene Microarray ◽

Ensemble Classifiers ◽

Classifier Ensembles ◽

Rotation Forest ◽

Ensemble Techniques ◽

Effective Diagnosis

The gene microarray analysis and classification have demonstrated an effective way for the effective diagnosis of diseases and cancers. However, it has been also revealed that the basic classification techniques have intrinsic drawbacks in achieving accurate gene classification and cancer diagnosis. On the other hand, classifier ensembles have received increasing attention in various applications. Here, we address the gene classification issue using RotBoost ensemble methodology. This method is a combination of Rotation Forest and AdaBoost techniques which in turn preserve both desirable features of an ensemble architecture, that is, accuracy and diversity. To select a concise subset of informative genes, 5 different feature selection algorithms are considered. To assess the efficiency of the RotBoost, other nonensemble/ensemble techniques including Decision Trees, Support Vector Machines, Rotation Forest, AdaBoost, and Bagging are also deployed. Experimental results have revealed that the combination of the fast correlation-based feature selection method with ICA-based RotBoost ensemble is highly effective for gene classification. In fact, the proposed method can create ensemble classifiers which outperform not only the classifiers produced by the conventional machine learning but also the classifiers generated by two widely used conventional ensemble learning methods, that is, Bagging and AdaBoost.

Download Full-text

On the Feature Selection and Classification Based on Information Gain for Document Sentiment Analysis

Applied Computational Intelligence and Soft Computing ◽

10.1155/2018/1407817 ◽

2018 ◽

Vol 2018 ◽

pp. 1-5 ◽

Cited By ~ 14

Author(s):

Asriyanti Indah Pratiwi ◽

Adiwijaya

Keyword(s):

Feature Selection ◽

Sentiment Analysis ◽

Classification Scheme ◽

Information Gain ◽

Sentiment Classification ◽

Experimental Results ◽

Enormous Number

Sentiment analysis in a movie review is the needs of today lifestyle. Unfortunately, enormous features make the sentiment of analysis slow and less sensitive. Finding the optimum feature selection and classification is still a challenge. In order to handle an enormous number of features and provide better sentiment classification, an information-based feature selection and classification are proposed. The proposed method reduces more than 90% unnecessary features while the proposed classification scheme achieves 96% accuracy of sentiment classification. From the experimental results, it can be concluded that the combination of proposed feature selection and classification achieves the best performance so far.

Download Full-text

Using Feature Selection in Combination with Ensemble Learning Techniques to Improve Tweet Sentiment Classification Performance

2015 IEEE 27th International Conference on Tools with Artificial Intelligence (ICTAI) ◽

10.1109/ictai.2015.39 ◽

2015 ◽

Cited By ~ 6

Author(s):

Joseph D. Prusa ◽

Taghi M. Khoshgoftaar ◽

Amri Napolitano

Keyword(s):

Feature Selection ◽

Ensemble Learning ◽

Classification Performance ◽

Sentiment Classification ◽

Learning Techniques

Download Full-text

Comparing ELM with SVM in the Field of Sentiment Classification of Social Media Text Data

Proceedings in Adaptation, Learning and Optimization - Proceedings of ELM 2018 ◽

10.1007/978-3-030-23307-5_36 ◽

2019 ◽

pp. 336-344

Author(s):

Zhihuan Chen ◽

Zhaoxia Wang ◽

Zhiping Lin ◽

Ting Yang

Keyword(s):

Social Media ◽

Sentiment Classification ◽

Text Data ◽

Social Media Text

Download Full-text

Opinion Mining on Culinary Food Customer Satisfaction Using Naïve Bayes Based-on Hybrid Feature Selection

Indonesian Journal of Electrical Engineering and Computer Science ◽

10.11591/ijeecs.v15.i1.pp468-475 ◽

2019 ◽

Vol 15 (1) ◽

pp. 468 ◽

Cited By ~ 3

Author(s):

Oman Somantri ◽

Dyah Apriliani

Keyword(s):

Feature Selection ◽

Opinion Mining ◽

Naive Bayes ◽

Information Gain ◽

Naïve Bayes ◽

Classification Model ◽

Consumer Ratings ◽

Bayes Algorithm ◽

Restaurant Owners

<p>Conducting an assessment of consumer sentiments taken from social media in assessing a culinary food gives useful information for everyone who wants to get this information especially for migrants and tourists, in th other hand that information is very valuable for food stall and restaurant owners as information in improvinf food quality. Overcoming this problem, a sentiment analysis classification model using naïve bayes algorithm (NB) was applied to get this information. This problem occurs is the level of accuracy of classification of consumer ratings of culinary food is still not optimal because the weight of values in the data preprocessing process are not optimal. In this paper proposed a hybrid feature selection models to overcome the problems in the process of selecting the feature attributes that have not been optimal by using a combination of information gain (IG) and genetic algorithm (GA) algorithms. The result of this research showed that after the experiment and compared to using others algorithms produce the best of the level occuracy is 93%.</p>

Download Full-text