Hybrid Ensemble Learning With Feature Selection for Sentiment Classification in Social Media

2020 ◽  
Vol 10 (2) ◽  
pp. 40-58 ◽  
Author(s):  
Sanur Sharma ◽  
Anurag Jain

This article presents a study on ensemble learning and an empirical evaluation of various ensemble classifiers and ensemble features for sentiment classification of social media data. The data was collected from Twitter in real-time using Twitter API and text pre-processing and ranking-based feature selection is applied to textual data. A framework for a hybrid ensemble learning model is presented where a combination of ensemble features (Information Gain and CHI-Squared) and ensemble classifier that includes Ada Boost with SMO-SVM and Logistic Regression has been implemented. The classification of Twitter data is performed where sentiment analysis is used as a feature. The proposed model has shown improvements as compared to the state-of-the-art methods with an accuracy of 88.2% with a low error rate.

2022 ◽  
Vol 2161 (1) ◽  
pp. 012003
Author(s):  
Rajat Jain ◽  
Pranam R Betrabet ◽  
B Ashwath Rao ◽  
N V Subba Reddy

Abstract Arrhythmia is one of the life-threatening heart diseases which is diagnosed and analyzed using electrocardiogram (ECG) recordings and other symptoms namely rapid heartbeat or chest-pounding, shortness of breath, near fainting spells, insufficient pumping of blood from the heart, etc along with sudden cardiac arrest. Arrhythmia records a hasty and aberrant ECG. In this implementation, the arrhythmia dataset is collected from the UCI machine learning repository and then classified the records into sixteen stated classes using multiclass classification. The large feature set of the dataset is reduced using improved feature selection techniques such as t-Distributed Stochastic Neighbor Embedding (TSNE), Principal Component Analysis (PCA), Uniform Manifold Approximation, and Projection (UMAP) and then an Ensemble Classifier is built to analyse the classification accuracy on arrhythmia dataset to conclude when and which approach gives optimal results.


2018 ◽  
Vol 7 (1) ◽  
pp. 57-72
Author(s):  
H.P. Vinutha ◽  
Poornima Basavaraju

Day by day network security is becoming more challenging task. Intrusion detection systems (IDSs) are one of the methods used to monitor the network activities. Data mining algorithms play a major role in the field of IDS. NSL-KDD'99 dataset is used to study the network traffic pattern which helps us to identify possible attacks takes place on the network. The dataset contains 41 attributes and one class attribute categorized as normal, DoS, Probe, R2L and U2R. In proposed methodology, it is necessary to reduce the false positive rate and improve the detection rate by reducing the dimensionality of the dataset, use of all 41 attributes in detection technology is not good practices. Four different feature selection methods like Chi-Square, SU, Gain Ratio and Information Gain feature are used to evaluate the attributes and unimportant features are removed to reduce the dimension of the data. Ensemble classification techniques like Boosting, Bagging, Stacking and Voting are used to observe the detection rate separately with three base algorithms called Decision stump, J48 and Random forest.


2013 ◽  
Vol 2013 ◽  
pp. 1-10 ◽  
Author(s):  
Alireza Osareh ◽  
Bita Shadgar

The gene microarray analysis and classification have demonstrated an effective way for the effective diagnosis of diseases and cancers. However, it has been also revealed that the basic classification techniques have intrinsic drawbacks in achieving accurate gene classification and cancer diagnosis. On the other hand, classifier ensembles have received increasing attention in various applications. Here, we address the gene classification issue using RotBoost ensemble methodology. This method is a combination of Rotation Forest and AdaBoost techniques which in turn preserve both desirable features of an ensemble architecture, that is, accuracy and diversity. To select a concise subset of informative genes, 5 different feature selection algorithms are considered. To assess the efficiency of the RotBoost, other nonensemble/ensemble techniques including Decision Trees, Support Vector Machines, Rotation Forest, AdaBoost, and Bagging are also deployed. Experimental results have revealed that the combination of the fast correlation-based feature selection method with ICA-based RotBoost ensemble is highly effective for gene classification. In fact, the proposed method can create ensemble classifiers which outperform not only the classifiers produced by the conventional machine learning but also the classifiers generated by two widely used conventional ensemble learning methods, that is, Bagging and AdaBoost.


2018 ◽  
Vol 2018 ◽  
pp. 1-5 ◽  
Author(s):  
Asriyanti Indah Pratiwi ◽  
Adiwijaya

Sentiment analysis in a movie review is the needs of today lifestyle. Unfortunately, enormous features make the sentiment of analysis slow and less sensitive. Finding the optimum feature selection and classification is still a challenge. In order to handle an enormous number of features and provide better sentiment classification, an information-based feature selection and classification are proposed. The proposed method reduces more than 90% unnecessary features while the proposed classification scheme achieves 96% accuracy of sentiment classification. From the experimental results, it can be concluded that the combination of proposed feature selection and classification achieves the best performance so far.


Author(s):  
Oman Somantri ◽  
Dyah Apriliani

<p>Conducting an assessment of consumer sentiments taken from social media in assessing a culinary food gives useful information for everyone who wants to get this information especially for migrants and tourists, in th other hand that information is very valuable for food stall and restaurant owners as information in improvinf food quality. Overcoming this problem, a sentiment analysis classification model using naïve bayes algorithm (NB) was applied to get this information. This problem occurs is the level of accuracy of classification of consumer ratings of culinary food is still not optimal because the weight of values in the data preprocessing process are not optimal. In this paper proposed a hybrid feature selection models to overcome the problems in the process of selecting the feature attributes that have not been optimal by using a combination of information gain (IG) and genetic algorithm (GA) algorithms. The result of this research showed that after the experiment and compared to using others algorithms produce the best of the level occuracy is 93%.</p>


Sign in / Sign up

Export Citation Format

Share Document