Comparison of Machine Learning and Sentiment Analysis in Detection of Suspicious Online Reviewers on Different Type of Data

The article focuses on solving an important problem of detecting suspicious reviewers in online discussions on social networks. We have concentrated on a special type of suspicious authors, on trolls. We have used methods of machine learning for generation of detection models to discriminate a troll reviewer from a common reviewer, but also methods of sentiment analysis to recognize the sentiment typical for troll’s comments. The sentiment analysis can be provided also using machine learning or lexicon-based approach. We have used lexicon-based sentiment analysis for its better ability to detect a dictionary typical for troll authors. We have achieved Accuracy = 0.95 and F1 = 0.80 using sentiment analysis. The best results using machine learning methods were achieved by support vector machine, Accuracy = 0.986 and F1 = 0.988, using a dataset with the set of all selected attributes. We can conclude that detection model based on machine learning is more successful than lexicon-based sentiment analysis, but the difference in accuracy is not so large as in F1 measure.

Download Full-text

Analisis Sentimen Wacana Pemindahan Ibu Kota Indonesia Menggunakan Algoritma Support Vector Machine (SVM)

Jurnal Teknologi Informasi dan Ilmu Komputer ◽

10.25126/jtiik.0813944 ◽

2021 ◽

Vol 8 (1) ◽

pp. 147

Author(s):

Primandani Arsi ◽

Retno Waluyo

Keyword(s):

Machine Learning ◽

Social Media ◽

Support Vector Machine ◽

Sentiment Analysis ◽

Public Discourse ◽

Capital City ◽

Support Vector ◽

The Public ◽

Machine Learning Methods ◽

Classification Technique

Dewasa ini, media sosial berkembang pesat di internet, salah satu yang banyak digemari adalah Twitter. Berbagai topik ramai diperbincangkan di Twitter mulai dari ekonomi, politik, sosial, budaya, hukum dan lain-lain. Salah satu topik yang ramai diperbincangkan di Twitter adalah terkait isu pemindahan ibu kota Indonesia. Namun dibalik hal tersebut terdapat kontroversi dari pihak yang merasa pro dan kontra, masing-masing memiiki sudut pandang yang berbeda. Hal ini menyebabkan munculnya fenomena perdebatan khususnya di Twitter yang sebenarnya menunjukkan perhatian kolektif mengenai wacana publik tersebut. Analisis sentimen adalah proses mengekstraksi, memahami dan mengolah data berupa teks yang tidak terstruktur secara otomatis guna mendapatkan informasi sentimen yang terdapat pada sebuah kalimat pendapat atau opini. Dalam penerapan analisis sentimen menggunakan metode machine learning terdapat beberapa metode yang sering digunakan. Dalam penelitian ini diusulkan metode Support Vector Machine (SVM) untuk diterapkan pada tweets topik pemindahan ibu kota Indonesia untuk tujuan klasifikasi kelas sentimen pada media sosial twitter. Teknis klasifikasi dilakukan dengan cara mengklasifikasikan menjadi 2 kelas yakni positif dan negatif. Berdasarkan hasil pengujian yang dilakukan terhadap tweets sentimen pemindahan ibu kota dari media sosial twitter sebanyak 1.236 tweets (404 positif dan 832 negatif) menggunakan SVM diperoleh akurasi =96,68%, precision=95.82%, recall=94.04% dan AUC = 0,979. AbstractToday, social media is growing fast on the internet.One of the most popular social media is Twitter. Many topics are discussed on Twitter such as economic, politic, social, culture, and law. One of the hot topics discussed on Twitter is the issue of relocating Indonesia's capital city. However, there is controversy from supporters and opponents. They have different views. This issue leads to a phenomenon of debate on Twitter that actually shows a collective concern about the public discourse. Sentiment analysis is a process of extracting, understanding and processing unstructured data to get sentiment information which is found in an opinion sentence. Application of sentiment analysis using machine learning methods shows that there are several methods that are often used. In this study, the Support Vector Machine (SVM) method is proposed to be applied to tweets on the topic of relocating Indonesia's capital city for sentiment classification on social media twitter. The classification technique is carried out into 2 classes, namely positive and negative. Based on testing on the sentiment of relocating Indonesia's capital city from social media twitter from 1,116 tweets (404 positive and 832 negative) using SVM obtained accuracy = 96.68%, precision = 95.82%, recall = 94.04% and AUC = 0.979.

Download Full-text

Evaluating Annotated Dataset of Customer Reviews for Aspect Based Sentiment Analysis

Journal of Web Engineering ◽

10.13052/jwe1540-9589.2122 ◽

2021 ◽

Author(s):

Dimple Chehal ◽

Parul Gupta ◽

Payal Gulati

Keyword(s):

Machine Learning ◽

Support Vector Machine ◽

Sentiment Analysis ◽

Nearest Neighbor ◽

Supervised Machine Learning ◽

Support Vector ◽

Product Reviews ◽

K Nearest Neighbor ◽

Customer Reviews ◽

Percent Accuracy

Sentiment analysis of product reviews on e-commerce platforms aids in determining the preferences of customers. Aspect-based sentiment analysis (ABSA) assists in identifying the contributing aspects and their corresponding polarity, thereby allowing for a more detailed analysis of the customer’s inclination toward product aspects. This analysis helps in the transition from the traditional rating-based recommendation process to an improved aspect-based process. To automate ABSA, a labelled dataset is required to train a supervised machine learning model. As the availability of such dataset is limited due to the involvement of human efforts, an annotated dataset has been provided here for performing ABSA on customer reviews of mobile phones. The dataset comprising of product reviews of Apple-iPhone11 has been manually annotated with predefined aspect categories and aspect sentiments. The dataset’s accuracy has been validated using state-of-the-art machine learning techniques such as Naïve Bayes, Support Vector Machine, Logistic Regression, Random Forest, K-Nearest Neighbor and Multi Layer Perceptron, a sequential model built with Keras API. The MLP model built through Keras Sequential API for classifying review text into aspect categories produced the most accurate result with 67.45 percent accuracy. K- nearest neighbor performed the worst with only 49.92 percent accuracy. The Support Vector Machine had the highest accuracy for classifying review text into aspect sentiments with an accuracy of 79.46 percent. The model built with Keras API had the lowest 76.30 percent accuracy. The contribution is beneficial as a benchmark dataset for ABSA of mobile phone reviews.

Download Full-text

Implementation of n-gram Methodology for Rotten Tomatoes Review Dataset Sentiment Analysis

International Journal of Knowledge Discovery in Bioinformatics ◽

10.4018/ijkdb.2017010103 ◽

2017 ◽

Vol 7 (1) ◽

pp. 30-41 ◽

Cited By ~ 12

Author(s):

Prayag Tiwari ◽

Brojo Kishore Mishra ◽

Sachin Kumar ◽

Vivek Kumar

Keyword(s):

Machine Learning ◽

Support Vector Machine ◽

Sentiment Analysis ◽

Maximum Entropy ◽

Learning Strategies ◽

Supervised Machine Learning ◽

Support Vector ◽

N Gram ◽

F Measure ◽

Blog Posts

Sentiment Analysis intends to get the basic perspective of the content, which may be anything that holds a subjective supposition, for example, an online audit, Comments on Blog posts, film rating and so forth. These surveys and websites might be characterized into various extremity gatherings, for example, negative, positive, and unbiased keeping in mind the end goal to concentrate data from the info dataset. Supervised machine learning strategies group these reviews. In this paper, three distinctive machine learning calculations, for example, Support Vector Machine (SVM), Maximum Entropy (ME) and Naive Bayes (NB), have been considered for the arrangement of human conclusions. The exactness of various strategies is basically inspected keeping in mind the end goal to get to their execution on the premise of parameters, e.g. accuracy, review, f-measure, and precision.

Download Full-text

Fast-forward solver for inhomogeneous media using machine learning methods: artificial neural network, support vector machine and fuzzy logic

Neural Computing and Applications ◽

10.1007/s00521-016-2694-9 ◽

2016 ◽

Vol 29 (12) ◽

pp. 1583-1591 ◽

Cited By ~ 4

Author(s):

Mohammad Abdolrazzaghi ◽

Soheil Hashemy ◽

Ali Abdolali

Keyword(s):

Neural Network ◽

Machine Learning ◽

Artificial Neural Network ◽

Support Vector Machine ◽

Fuzzy Logic ◽

Inhomogeneous Media ◽

Support Vector ◽

Learning Methods ◽

Network Support ◽

Machine Learning Methods

Download Full-text

Prediction of Collapsibility of Loess of Construction Sites in Xining Based on Machine Learning Methods

10.21203/rs.3.rs-307514/v1 ◽

2021 ◽

Author(s):

Qifei Zhao ◽

Xiaojun Li ◽

Yunning Cao ◽

Zhikun Li ◽

Jixin Fan

Keyword(s):

Machine Learning ◽

Support Vector Machine ◽

Training Data ◽

Support Vector ◽

Engineering Practice ◽

Burial Depth ◽

Learning Methods ◽

Data Set ◽

Machine Learning Methods ◽

North East

Abstract Collapsibility of loess is a significant factor affecting engineering construction in loess area, and testing the collapsibility of loess is costly. In this study, A total of 4,256 loess samples are collected from the north, east, west and middle regions of Xining. 70% of the samples are used to generate training data set, and the rest are used to generate verification data set, so as to construct and validate the machine learning models. The most important six factors are selected from thirteen factors by using Grey Relational analysis and multicollinearity analysis: burial depth、water content、specific gravity of soil particles、void rate、geostatic stress and plasticity limit. In order to predict the collapsibility of loess, four machine learning methods: Support Vector Machine (SVM), Random Subspace Based Support Vector Machine (RSSVM), Random Forest (RF) and Naïve Bayes Tree (NBTree), are studied and compared. The receiver operating characteristic (ROC) curve indicators, standard error (SD) and 95% confidence interval (CI) are used to verify and compare the models in different research areas. The results show that: RF model is the most efficient in predicting the collapsibility of loess in Xining, and its AUC average is above 80%, which can be used in engineering practice.

Download Full-text

Aspect Category Classification dengan Pendekatan Machine Learning Menggunakan Dataset Bahasa Indonesia

Jurnal Nasional Teknik Elektro dan Teknologi Informasi (JNTETI) ◽

10.22146/jnteti.v10i3.1819 ◽

2021 ◽

Vol 10 (3) ◽

pp. 229-235

Author(s):

Syaifulloh Amien Pandega Perdana ◽

Teguh Bharata Aji ◽

Ridi Ferdiana

Keyword(s):

Machine Learning ◽

Support Vector Machine ◽

Random Forest ◽

Sentiment Analysis ◽

Support Vector ◽

Term Weighting ◽

Inverse Document Frequency ◽

Term Frequency ◽

Document Frequency ◽

Bahasa Indonesia

Ulasan pelanggan merupakan opini terhadap kualitas barang atau jasa yang dirasakan konsumen. Ulasan pelanggan mengandung informasi yang berguna bagi konsumen maupun penyedia barang atau jasa. Ketersediaan ulasan pelanggan dalam jumlah besar pada website membutuhkan suatu framework untuk mengekstraksi sentimen secara otomatis. Sebuah ulasan pelanggan sering kali mengandung banyak aspek sehingga Aspect Based Sentiment Analysis (ABSA) harus digunakan untuk mengetahui polaritas masing-masing aspek. Salah satu tugas penting dalam ABSA adalah Aspect Category Detection. Metode machine learning untuk Aspect Category Detection sudah banyak dilakukan pada domain berbahasa Inggris, tetapi pada domain bahasa Indonesia masih sedikit. Makalah ini membandingkan kinerja tiga algoritme machine learning, yaitu Naïve Bayes (NB), Support Vector Machine (SVM), dan Random Forest (RF) pada ulasan pelanggan berbahasa Indonesia menggunakan Term Frequency–Inverse Document Frequency (TF-IDF) sebagai term weighting. Hasil menunjukkan bahwa RF memiliki kinerja paling unggul dibandingkan NB dan SVM pada tiga domain yang berbeda, yaitu restoran, hotel, dan e-commerce, dengan nilai f1-score untuk masing-masing domain adalah 84.3%, 85.7%, dan 89,3%.

Download Full-text

Support Vector Machine Berbasis Feature Selection Untuk Sentiment Analysis Kepuasan Pelanggan Terhadap Pelayanan Warung dan Restoran Kuliner Kota Tegal

Jurnal Teknologi Informasi dan Ilmu Komputer ◽

10.25126/jtiik.201855867 ◽

2018 ◽

Vol 5 (5) ◽

pp. 537 ◽

Cited By ~ 1

Author(s):

Oman Somantri ◽

Dyah Apriliani

Keyword(s):

Support Vector Machine ◽

Feature Selection ◽

Sentiment Analysis ◽

Information Gain ◽

Support Vector ◽

Chi Square ◽

Proposed Model ◽

Chi Squared ◽

The Difference ◽

Increase In Accuracy

Abstrak Setiap pelanggan pasti menginginkan sebuah pendukung keputusan dalam menentukan pilihan ketika akan mengunjungi sebuah tempat makan atau kuliner yang sesuai dengan keinginan salah satu contohnya yaitu di Kota Tegal. Sentiment analysis digunakan untuk memberikan sebuah solusi terkait dengan permasalahan tersebut, dengan menereapkan model algoritma Support Vector Machine (SVM). Tujuan dari penelitian ini adalah mengoptimalisasi model yang dihasilkan dengan diterapkannya feature selection menggunakan algoritma Informatioan Gain (IG) dan Chi Square pada hasil model terbaik yang dihasilkan oleh SVM pada klasifikasi tingkat kepuasan pelanggan terhadap warung dan restoran kuliner di Kota Tegal sehingga terjadi peningkatan akurasi dari model yang dihasilkan. Hasil penelitian menunjukan bahwa tingkat akurasi terbaik dihasilkan oleh model SVM-IG dengan tingkat akurasi terbaik sebesar 72,45% mengalami peningkatan sekitar 3,08% yang awalnya 69.36%. Selisih rata-rata yang dihasilkan setelah dilakukannya optimasi SVM dengan feature selection adalah 2,51% kenaikan tingkat akurasinya. Berdasarkan hasil penelitian bahwa feature selection dengan menggunakan Information Gain (IG) (SVM-IG) memiliki tingkat akurasi lebih baik apabila dibandingkan SVM dan Chi Squared (SVM-CS) sehingga dengan demikian model yang diusulkan dapat meningkatkan tingkat akurasi yang dihasilkan oleh SVM menjadi lebih baik. Abstract The Customer needs to get a decision support in determining a choice when they’re visit a culinary restaurant accordance to their wishes especially at Tegal City. Sentiment analysis is used to provide a solution related to this problem by applying the Support Vector Machine (SVM) algorithm model. The purpose of this research is to optimize the generated model by applying feature selection using Informatioan Gain (IG) and Chi Square algorithm on the best model produced by SVM on the classification of customer satisfaction level based on culinary restaurants at Tegal City so that there is an increasing accuracy from the model. The results showed that the best accuracy level produced by the SVM-IG model with the best accuracy of 72.45% experienced an increase of about 3.08% which was initially 69.36%. The difference average produced after SVM optimization with feature selection is 2.51% increase in accuracy. Based on the results of the research, the feature selection using Information Gain (SVM-IG) has a better accuracy rate than SVM and Chi Squared (SVM-CS) so that the proposed model can improve the accuracy of SVM better.

Download Full-text

A New SVM Method for Recognizing Polarity of Sentiments in Twitter

Handbook of Research on Soft Computing and Nature-Inspired Algorithms - Advances in Computational Intelligence and Robotics ◽

10.4018/978-1-5225-2128-0.ch009 ◽

2017 ◽

pp. 281-291 ◽

Cited By ~ 1

Author(s):

Sanjiban Sekhar Roy ◽

Marenglen Biba ◽

Rohan Kumar ◽

Rahul Kumar ◽

Pijush Samui

Keyword(s):

Social Networks ◽

Support Vector Machine ◽

Public Relations ◽

Social Networking ◽

Sentiment Analysis ◽

High Accuracy ◽

Support Vector ◽

Online Social Networking ◽

Web Page ◽

Twitter Data

Online social networking platforms, such as Weblogs, micro blogs, and social networks are intensively being utilized daily to express individual's thinking. This permits scientists to collect huge amounts of data and extract significant knowledge regarding the sentiments of a large number of people at a scale that was essentially impractical a couple of years back. Therefore, these days, sentiment analysis has the potential to learn sentiments towards persons, object and occasions. Twitter has increasingly become a significant social networking platform where people post messages of up to 140 characters known as ‘Tweets'. Tweets have become the preferred medium for the marketing sector as users can instantly indicate customer success or indicate public relations disaster far more quickly than a web page or traditional media does. In this paper, we have analyzed twitter data and have predicted positive and negative tweets with high accuracy rate using support vector machine (SVM).

Download Full-text

Sentiment Analysis of Product Reviews using Support Vector Machine Learning Algorithm

Indian Journal of Science and Technology ◽

10.17485/ijst/2017/v10i35/118965 ◽

2017 ◽

Vol 10 (35) ◽

pp. 1-9 ◽

Cited By ~ 6

Author(s):

Esha Tyagi ◽

Arvind Kumar Sharma ◽

◽

Keyword(s):

Machine Learning ◽

Support Vector Machine ◽

Sentiment Analysis ◽

Learning Algorithm ◽

Support Vector ◽

Machine Learning Algorithm ◽

Product Reviews

Download Full-text

Retrieval of aerosol optical depth from surface solar radiation measurements using machine learning algorithms, non-linear regression and a radiative transfer-based look-up table

Atmospheric Chemistry and Physics ◽

10.5194/acp-16-8181-2016 ◽

2016 ◽

Vol 16 (13) ◽

pp. 8181-8191 ◽

Cited By ~ 10

Author(s):

Jani Huttunen ◽

Harri Kokkola ◽

Tero Mielonen ◽

Mika Esa Juhani Mononen ◽

Antti Lipponen ◽

...

Keyword(s):

Neural Network ◽

Machine Learning ◽

Support Vector Machine ◽

Linear Regression ◽

Support Vector ◽

Learning Methods ◽

Surface Solar Radiation ◽

Machine Learning Methods ◽

Look Up Table ◽

Non Linear

Abstract. In order to have a good estimate of the current forcing by anthropogenic aerosols, knowledge on past aerosol levels is needed. Aerosol optical depth (AOD) is a good measure for aerosol loading. However, dedicated measurements of AOD are only available from the 1990s onward. One option to lengthen the AOD time series beyond the 1990s is to retrieve AOD from surface solar radiation (SSR) measurements taken with pyranometers. In this work, we have evaluated several inversion methods designed for this task. We compared a look-up table method based on radiative transfer modelling, a non-linear regression method and four machine learning methods (Gaussian process, neural network, random forest and support vector machine) with AOD observations carried out with a sun photometer at an Aerosol Robotic Network (AERONET) site in Thessaloniki, Greece. Our results show that most of the machine learning methods produce AOD estimates comparable to the look-up table and non-linear regression methods. All of the applied methods produced AOD values that corresponded well to the AERONET observations with the lowest correlation coefficient value being 0.87 for the random forest method. While many of the methods tended to slightly overestimate low AODs and underestimate high AODs, neural network and support vector machine showed overall better correspondence for the whole AOD range. The differences in producing both ends of the AOD range seem to be caused by differences in the aerosol composition. High AODs were in most cases those with high water vapour content which might affect the aerosol single scattering albedo (SSA) through uptake of water into aerosols. Our study indicates that machine learning methods benefit from the fact that they do not constrain the aerosol SSA in the retrieval, whereas the LUT method assumes a constant value for it. This would also mean that machine learning methods could have potential in reproducing AOD from SSR even though SSA would have changed during the observation period.

Download Full-text