Comparison of Bagging Ensemble Combination Rules for Imbalanced Text Sentiment Analysis

The wealth of opinions expressed by users on micro-blogging sites can be beneficial for product manufacturers of service providers, as they can gain insights about certain aspects of their products or services. The most common approach for analyzing text opinion is using machine learning. However. opinion data are often imbalanced, e.g. the number of positive sentiments heavily outnumbered the negative sentiments. Ensemble technique, which combines multiple classification algorithms to make decisions, can be used to tackle imbalanced data to learn from multiple balanced datasets. The decision of ensemble is obtained by combining the decisions of individual classifiers using a certain rule. Therefore, rule selection is an important factor in ensemble design. This research aims to investigate the best decision combination rule for imbalanced text data. Multinomial Naïve Bayes, Complement Naïve Bayes, Support Vector Machine, and Softmax Regression are used for base classifiers, and max, min, product, sum, vote, and meta-classifier rules are considered for decision combination. The experiment is done on several Twitter datasets. From the experimental results, it is found that the Softmax Regression ensemble with meta-classifier combination rule performs the best in all except in one dataset. However, it is also found that the training of the Softmax Regression ensemble requires intensive computational resources.

Download Full-text

KOMPARASI ALGORITMA KLASIFIKASI PADA ANALISIS REVIEW HOTEL

Jurnal Pilar Nusa Mandiri ◽

10.33480/pilar.v14i2.1023 ◽

2018 ◽

Vol 14 (2) ◽

pp. 261

Author(s):

Lila Dini Utami

Keyword(s):

Support Vector Machine ◽

Nearest Neighbor ◽

Naive Bayes ◽

Service Providers ◽

Naïve Bayes ◽

Support Vector ◽

K Nearest Neighbor ◽

Nearest Neighbor Algorithm ◽

K Nearest Neighbor Algorithm ◽

Auc Value

At this time the freedom to express opinions in oral and written forms about everything is very easy. This activity can be used to make decisions by some business people. Especially by service providers, such as hotels. This will be very useful in the development of the hotel business itself. But the review data must be processed using the right algorithm. So this study was conducted to find out which algorithms are more feasible to use to get the highest accuracy. The methods used are Naïve Bayes (NB), Support Vector Machine (SVM), and k-Nearest Neighbor (k-NN). From the process that has been done, the results of Naïve Bayes accuracy are 71.50% with the AUC value is 0.500, Support Vector Machine is 72.50% with the AUC value is 0.936 and the accuracy results if using the k-Nearest Neighbor algorithm is 75.00% with the AUC value is 0.500. The use of the k-Nearest Neighbor algorithm can help in making more appropriate decisions for hotel reviews at this time.

Download Full-text

Sentiment analysis on myindihome user reviews using support vector machine and naïve bayes classifier method

International Journal of Industrial Optimization ◽

10.12928/ijio.v2i2.4437 ◽

2021 ◽

Vol 2 (2) ◽

pp. 151

Author(s):

Sulton Nur Hakim ◽

Andika Julianto Putra ◽

Annisa Uswatun Khasanah

Keyword(s):

Support Vector Machine ◽

Sentiment Analysis ◽

Naive Bayes ◽

Service Providers ◽

Naïve Bayes ◽

Support Vector ◽

Naive Bayes Classifier ◽

Bayes Classifier ◽

Naïve Bayes Classifier ◽

Total Accuracy

In the era of globalization, the internet has become a human need in doing various things. Many internet users are an opportunity for internet service providers, PT Telekomunikasi Indonesia (Telkom). One of PT Telkom's products is IndiHome. As the only state-owned enterprise engaged in telecommunications, PT Telkom is expected to meet the needs of the Indonesian people. However, based on the rating obtained by IndiHome products through the myIndiHome application on Google Play, it is 3.5 out of 87,000 more reviews. The reviews focus on how important the effect of word-of-mouth is on choosing and using internet provider products. The review data was collected on November 1, 2020 to December 15, 2020, with a total of 2,539 reviews as a sample. The sentiment analysis process that has been carried out shows that the number of reviews included in the negative sentiment class was 1.160 reviews, and the positive class was 1.374 reviews out of a total of 2,539 reviews. The results indicate that service errors in IndiHome services are still quite high, reaching 46.7% as indicated by the number of negative reviews. The classification results show that the average value of the total accuracy of the Support Vector Machine (SVM) method is 86.54% greater than Naïve Bayes Classifier (NBC) method which has an average total accuracy of 84.69%. Based on fishbone diagram analysis, there are 12nd problems on negative reviews that classify problems 5P factors: Price, People, Process, Place, and Product.

Download Full-text

Fake News Detection from Online media using Machine learning Classifiers

Journal of Physics Conference Series ◽

10.1088/1742-6596/2161/1/012027 ◽

2022 ◽

Vol 2161 (1) ◽

pp. 012027

Author(s):

Shalini Pandey ◽

Sankeerthi Prabhakaran ◽

N V Subba Reddy ◽

Dinesh Acharya

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Decision Tree ◽

Nearest Neighbor ◽

Naive Bayes ◽

Imbalanced Data ◽

Naïve Bayes ◽

Support Vector ◽

Fake News ◽

K Nearest Neighbor

Abstract With the advancement in technology, the consumption of news has shifted from Print media to social media. The convenience and accessibility are major factors that have contributed to this shift in consumption of the news. However, this change has bought upon a new challenge in the form of “Fake news” being spread with not much supervision available on the net. In this paper, this challenge has been addressed through a Machine learning concept. The algorithms such as K-Nearest Neighbor, Support Vector Machine, Decision Tree, Naïve Bayes and Logistic regression Classifiers to identify the fake news from real ones in a given dataset and also have increased the efficiency of these algorithms by pre-processing the data to handle the imbalanced data more appropriately. Additionally, comparison of the working of these classifiers is presented along with the results. The model proposed has achieved an accuracy of 89.98% for KNN, 90.46% for Logistic Regression, 86.89% for Naïve Bayes, 73.33% for Decision Tree and 89.33% for SVM in our experiment.

Download Full-text

KLASIFIKASI SMS SPAM MENGGUNAKAN SUPPORT VECTOR MACHINE

Jurnal Pilar Nusa Mandiri ◽

10.33480/pilar.v15i2.693 ◽

2019 ◽

Vol 15 (2) ◽

pp. 275-280

Author(s):

Agus Setiyono ◽

Hilman F Pardede

Keyword(s):

Data Mining ◽

Support Vector Machine ◽

Decision Tree ◽

Naive Bayes ◽

Naïve Bayes ◽

Support Vector ◽

Spam Detection ◽

Support Vector Machine Algorithm ◽

Data Mining Techniques ◽

To Receive

It is now common for a cellphone to receive spam messages. Great number of received messages making it difficult for human to classify those messages to Spam or no Spam. One way to overcome this problem is to use Data Mining for automatic classifications. In this paper, we investigate various data mining techniques, named Support Vector Machine, Multinomial Naïve Bayes and Decision Tree for automatic spam detection. Our experimental results show that Support Vector Machine algorithm is the best algorithm over three evaluated algorithms. Support Vector Machine achieves 98.33%, while Multinomial Naïve Bayes achieves 98.13% and Decision Tree is at 97.10 % accuracy.

Download Full-text

Analysis of Feature Reduction Techniques for Online News Popularity Prediction

SMART MOVES JOURNAL IJOSCIENCE ◽

10.24113/ijo-science.v4i10.165 ◽

2018 ◽

Vol 4 (10) ◽

pp. 6

Author(s):

Shivangi Bhargava ◽

Dr. Shivnath Ghosh

Keyword(s):

Neural Network ◽

Support Vector Machine ◽

Particle Swarm Optimization ◽

Naive Bayes ◽

Particle Swarm ◽

Naïve Bayes ◽

Online News ◽

Feature Reduction ◽

Support Vector ◽

Swarm Optimization

News popularity is the maximum growth of attention given for particular news article. The popularity of online news depends on various factors such as the number of social media, the number of visitor comments, the number of Likes, etc. It is therefore necessary to build an automatic decision support system to predict the popularity of the news as it will help in business intelligence too. The work presented in this study aims to find the best model to predict the popularity of online news using machine learning methods. In this work, the result analysis is performed by applying Co-relation algorithm, particle swarm optimization and principal component analysis. For performance evaluation support vector machine, naïve bayes, k-nearest neighbor and neural network classifiers are used to classify the popular and unpopular data. From the experimental results, it is observed that support vector machine and naïve bayes outperforms better with co-relation algorithm as well as k-NN and neural network outperforms better with particle swarm optimization.

Download Full-text

Algorithm Comparation of Naive Bayes and Support Vector Machine based on Particle Swarm Optimization in Sentiment Analysis of Freight Forwarding Services

Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi) ◽

10.29207/resti.v4i2.1840 ◽

2020 ◽

Vol 4 (2) ◽

pp. 362-369

Author(s):

Sharazita Dyah Anggita ◽

Ikmah

Keyword(s):

Support Vector Machine ◽

Sentiment Analysis ◽

Naive Bayes ◽

Naïve Bayes ◽

Support Vector ◽

The Public ◽

Svm Algorithm ◽

Bayes Algorithm ◽

Freight Forwarding ◽

Improved Accuracy

The needs of the community for freight forwarding are now starting to increase with the marketplace. User opinion about freight forwarding services is currently carried out by the public through many things one of them is social media Twitter. By sentiment analysis, the tendency of an opinion will be able to be seen whether it has a positive or negative tendency. The methods that can be applied to sentiment analysis are the Naive Bayes Algorithm and Support Vector Machine (SVM). This research will implement the two algorithms that are optimized using the PSO algorithms in sentiment analysis. Testing will be done by setting parameters on the PSO in each classifier algorithm. The results of the research that have been done can produce an increase in the accreditation of 15.11% on the optimization of the PSO-based Naive Bayes algorithm. Improved accuracy on the PSO-based SVM algorithm worth 1.74% in the sigmoid kernel.

Download Full-text

Analysis of Sentiment of Moving a National Capital with Feature Selection Naive Bayes Algorithm and Support Vector Machine

Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi) ◽

10.29207/resti.v4i3.1942 ◽

2020 ◽

Vol 4 (3) ◽

pp. 504-512

Author(s):

Faried Zamachsari ◽

Gabriel Vangeran Saragih ◽

Susafa'ati ◽

Windu Gata

Keyword(s):

Social Media ◽

Support Vector Machine ◽

Feature Selection ◽

Public Opinion ◽

Naive Bayes ◽

Naïve Bayes ◽

Capital City ◽

Support Vector ◽

National Capital ◽

Bayes Algorithm

The decision to move Indonesia's capital city to East Kalimantan received mixed responses on social media. When the poverty rate is still high and the country's finances are difficult to be a factor in disapproval of the relocation of the national capital. Twitter as one of the popular social media, is used by the public to express these opinions. How is the tendency of community responses related to the move of the National Capital and how to do public opinion sentiment analysis related to the move of the National Capital with Feature Selection Naive Bayes Algorithm and Support Vector Machine to get the highest accuracy value is the goal in this study. Sentiment analysis data will take from public opinion using Indonesian from Twitter social media tweets in a crawling manner. Search words used are #IbuKotaBaru and #PindahIbuKota. The stages of the research consisted of collecting data through social media Twitter, polarity, preprocessing consisting of the process of transform case, cleansing, tokenizing, filtering and stemming. The use of feature selection to increase the accuracy value will then enter the ratio that has been determined to be used by data testing and training. The next step is the comparison between the Support Vector Machine and Naive Bayes methods to determine which method is more accurate. In the data period above it was found 24.26% positive sentiment 75.74% negative sentiment related to the move of a new capital city. Accuracy results using Rapid Miner software, the best accuracy value of Naive Bayes with Feature Selection is at a ratio of 9:1 with an accuracy of 88.24% while the best accuracy results Support Vector Machine with Feature Selection is at a ratio of 5:5 with an accuracy of 78.77%.

Download Full-text