scholarly journals Sentiment Analysis for Customer Review: Case Study of GO-JEK Expansion

Author(s):  
Alifia Revan Prananda ◽  
Irfandy Thalib

Background: Market prediction is an important thing that needs to be analyzed deeply. Business intelligence becomes an important analysis procedure for analyzing the market demand and satisfaction. Since business intelligence needs a deep analysis, sentiment analysis becomes a powerful algorithm for analyzing customer review regarding to the business intelligence analysis.Objective: In this study, we perform a sentiment analysis for identifying the business intelligence analysis in GO-JEK.Methods: We use Twitter posts collected from the Twint library which consists of 3111 tweets. Since the dataset did not provide a ground truth, we perform Microsoft Text Analytic for determining positive, neutral, and negative sentiment. Before applying Microsoft Text Analytic, we conduct a pre-processing step to remove the unwanted data such as duplicate tweets, image, website address, etc.Results: According to the Microsoft Text Analytic, the results are 666 positive sentiment numbers, 2055 neutral sentiment numbers, and 127 negative sentiment numbers.Conclusion:  According to these results, we conclude that most GO-JEK customers are satisfied with the GO-JEK services. In this research, we also develop classification model to predict the sentiment analysis of new data. We use some classifier algorithms such as Decision Tree, Naïve Bayes, Support Vector Machine and Neural Network. In the result, the system shows      that the decision tree provides the best performance.

Author(s):  
Noviah Dwi Putranti ◽  
Edi Winarko

AbstrakAnalisis sentimen dalam penelitian ini merupakan proses klasifikasi dokumen tekstual ke dalam dua kelas, yaitu kelas sentimen positif dan negatif.  Data opini diperoleh dari jejaring sosial Twitter berdasarkan query dalam Bahasa Indonesia. Penelitian ini bertujuan untuk menentukan sentimen publik terhadap objek tertentu yang disampaikan di Twitter dalam bahasa Indonesia, sehingga membantu usaha untuk melakukan riset pasar atas opini publik. Data yang sudah terkumpul dilakukan proses preprocessing dan POS tagger untuk menghasilkan model klasifikasi melalui proses pelatihan. Teknik pengumpulan kata yang memiliki sentimen dilakukan dengan pendekatan berdasarkan kamus, yang dihasilkan dalam penelitian ini berjumlah 18.069 kata. Algoritma Maximum Entropy digunakan untuk POS tagger dan algoritma yang digunakan untuk membangun model klasifikasi atas data pelatihan dalam penelitian ini adalah Support Vector Machine. Fitur yang digunakan adalah unigram dengan fitur pembobotan TFIDF. Implementasi klasifikasi diperoleh akurasi 86,81 %  pada pengujian 7 fold cross validation untuk tipe kernel Sigmoid. Pelabelan kelas secara manual dengan POS tagger menghasilkan akurasi 81,67%.  Kata kunci—analisis sentimen, klasifikasi, maximum entropy POS tagger, support vector machine, twitter.  AbstractSentiment analysis in this research classified textual documents into two classes, positive and negative sentiment. Opinion data obtained a query from social networking site Twitter of Indonesian tweet. This research uses  Indonesian tweets. This study aims to determine public sentiment toward a particular object presented in Twitter businesses conduct market. Collected data then prepocessed to help POS tagged to generate classification models through the training process. Sentiment word collection has done the dictionary based approach, which is generated in this study consists 18.069 words. Maximum Entropy algorithm is used for POS tagger and the algorithms used to build the classification model on the training data is Support Vector Machine. The unigram features used are the features of TFIDF weighting.Classification implementation 86,81 % accuration at examination of 7 validation cross fold for the type of kernel of Sigmoid. Class labeling manually with POS tagger yield accuration 81,67 %. Keywords—sentiment analysis, classification, maximum entropy POS tagger, support vector machine, twitter.


2021 ◽  
Author(s):  
Mostafa Sa'eed Yakoot ◽  
Adel Mohamed Salem Ragab ◽  
Omar Mahmoud

Abstract Well integrity has become a crucial field with increased focus and being published intensively in industry researches. It is important to maintain the integrity of the individual well to ensure that wells operate as expected for their designated life (or higher) with all risks kept as low as reasonably practicable, or as specified. Machine learning (ML) and artificial intelligence (AI) models are used intensively in oil and gas industry nowadays. ML concept is based on powerful algorithms and robust database. Developing an efficient classification model for well integrity (WI) anomalies is now feasible because of having enormous number of well failures and well barrier integrity tests, and analyses in the database. Circa 9000 dataset points were collected from WI tests performed for 800 wells in Gulf of Suez, Egypt for almost 10 years. Moreover, those data have been quality-controlled and quality-assured by experienced engineers. The data contain different forms of WI failures. The contributing parameter set includes a total of 23 barrier elements. Data were structured and fed into 11 different ML algorithms to build an automated systematic tool for calculating imposed risk category of any well. Comparison analysis for the deployed models was performed to infer the best predictive model that can be relied on. 11 models include both supervised and ensemble learning algorithms such as random forest, support vector machine (SVM), decision tree and scalable boosting techniques. Out of 11 models, the results showed that extreme gradient boosting (XGB), categorical boosting (CatBoost), and decision tree are the most reliable algorithms. Moreover, novel evaluation metrics for confusion matrix of each model have been introduced to overcome the problem of existing metrics which don't consider domain knowledge during model evaluation. The innovated model will help to utilize company resources efficiently and dedicate personnel efforts to wells with the high-risk. As a result, progressive improvements on business, safety, environment, and performance of the business. This paper would be a milestone in the design and creation of the Well Integrity Database Management Program through the combination of integrity and ML.


2020 ◽  
Vol 24 (5) ◽  
pp. 1141-1160
Author(s):  
Tomás Alegre Sepúlveda ◽  
Brian Keith Norambuena

In this paper, we apply sentiment analysis methods in the context of the first round of the 2017 Chilean elections. The purpose of this work is to estimate the voting intention associated with each candidate in order to contrast this with the results from classical methods (e.g., polls and surveys). The data are collected from Twitter, because of its high usage in Chile and in the sentiment analysis literature. We obtained tweets associated with the three main candidates: Sebastián Piñera (SP), Alejandro Guillier (AG) and Beatriz Sánchez (BS). For each candidate, we estimated the voting intention and compared it to the traditional methods. To do this, we first acquired the data and labeled the tweets as positive or negative. Afterward, we built a model using machine learning techniques. The classification model had an accuracy of 76.45% using support vector machines, which yielded the best model for our case. Finally, we use a formula to estimate the voting intention from the number of positive and negative tweets for each candidate. For the last period, we obtained a voting intention of 35.84% for SP, compared to a range of 34–44% according to traditional polls and 36% in the actual elections. For AG we obtained an estimate of 37%, compared with a range of 15.40% to 30.00% for traditional polls and 20.27% in the elections. For BS we obtained an estimate of 27.77%, compared with the range of 8.50% to 11.00% given by traditional polls and an actual result of 22.70% in the elections. These results are promising, in some cases providing an estimate closer to reality than traditional polls. Some differences can be explained due to the fact that some candidates have been omitted, even though they held a significant number of votes.


2021 ◽  
Vol 11 (1) ◽  
pp. 15-24
Author(s):  
Dequan Guo ◽  
Gexiang Zhang ◽  
Hui Peng ◽  
Jianying Yuan ◽  
Prithwineel Paul ◽  
...  

In recent years, diseases of cardiovascular and cerebrovascular have attracted much attention due to main causes in death in human beings. To reduce mortality, there are lots of efforts which are focused on early diagnosis and prevention. It is an important reference index for cardiovascular diseases through the endovascular membrane in carotid artery by medical ultrasound images. The paper proposes a method which finds the region of interest (ROI) by convolutional neural network, segments and measures intima-media membrane mainly using support vector machine (SVM). Essentially, the task of detecting the membrane is one target detection problem. This paper adopts the strategy, named Yon Only Look Once (YOLO), a new detection algorithm, and follows the convolution neural network algorithm based on end-to-end training. Firstly, sufficient samples are extracted according to certain characteristics in the special region. It can be trained by the SVM classification model. Then the ROI is processed and all the pixels are classified into boundary points and non-boundary points through the classification model. Thirdly, the boundary points are selected to obtain the accurate boundary and calculate the intima-media thickness (IMT). In experiments, two hundred ultrasound images are tested, and the results verify that our algorithm is consistent with the results by ground truth (GT). The detection speed of the algorithm in this paper is in real time, and it has high generalization characteristics. The algorithm computes the intima-media thickness in ultrasound images accurately and quickly with 95% consistence to ground truth.


TEM Journal ◽  
2020 ◽  
pp. 1663-1668
Author(s):  
Shorouq Fathi Eletter

The exponential growth of unstructured data and the ability of businesses to utilize such data in decision-making have led to competitive advantages. The knowledge provided by analyzing unstructured data is crucial for product developers or service providers because it might affect the sustainability of the business. Sentiment analysis is used to gain an understanding of the attitudes, opinions, and emotions expressed within an online review. Naïve Bayes (NB), logistic regression (LR), decision trees (DT), deep learning (DL), and support vector machines (SVM) were used to build a classification model. In the data mining settings, the classification accuracy is the best metric to highlight the best classifier. The DL classifier outperformed other models in terms of accuracy rate. Classifying customers' feelings toward a product or service is critical for providing actionable insights. Utilizing such models will help to analyze huge volumes of reviews, saving both time and costs.


2020 ◽  
Vol 9 (3) ◽  
pp. 376-390
Author(s):  
Nur Fitriyah ◽  
Budi Warsito ◽  
Di Asih I Maruddani

Appearance of PT Aplikasi Karya Anak Bangsa or as known as Gojek since 2015 give a convenience facility to people in Indonesia especially in daily activities. Sentiment analysis on Twitter social media can be the option to see how Gojek users respond to the services that have been provided. The response was classified into positive sentiment and negative sentiment using Support Vector Machine method with model evaluation 10-fold cross validation. The kernel used is the linear kernel and the RBF kernel. Data labeling can be done with manually and sentiment scoring. The test results showed that the RBF kernel gets overall accuracy and the highest kappa accuracy on manual data labeling and sentiment scoring. On manual data labeling, the overall accuracy is 79.19% and kappa accuracy is 16.52%. While the labeling of data with sentiment scoring obtained overall accuracy of 79.19% and kappa accuracy of 21%. The greater overall accuracy value and kappa accuracy obtained, the better performance of the classification model. Keywords: Gojek, Twitter, Support Vector Machine, overall accuracy, kappa accuracy


2021 ◽  
pp. 1063293X2199180
Author(s):  
Babymol Kurian ◽  
VL Jyothi

A wide reach on cancer prediction and detection using Next Generation Sequencing (NGS) by the application of artificial intelligence is highly appreciated in the current scenario of the medical field. Next generation sequences were extracted from NCBI (National Centre for Biotechnology Information) gene repository. Sequences of normal Homo sapiens (Class 1), BRCA1 (Class 2) and BRCA2 (Class 3) were extracted for Machine Learning (ML) purpose. The total volume of datasets extracted for the process were 1580 in number under four categories of 50, 100, 150 and 200 sequences. The breast cancer prediction process was carried out in three major steps such as feature extraction, machine learning classification and performance evaluation. The features were extracted with sequences as input. Ten features of DNA sequences such as ORF (Open Reading Frame) count, individual nucleobase average count of A, T, C, G, AT and GC-content, AT/GC composition, G-quadruplex occurrence, MR (Mutation Rate) were extracted from three types of sequences for the classification process. The sequence type was also included as a target variable to the feature set with values 0, 1 and 2 for classes 1, 2 and 3 respectively. Nine various supervised machine learning techniques like LR (Logistic Regression statistical model), LDA (Linear Discriminant analysis model), k-NN (k nearest neighbours’ algorithm), DT (Decision tree technique), NB (Naive Bayes classifier), SVM (Support-Vector Machine algorithm), RF (Random Forest learning algorithm), AdaBoost (AB) and Gradient Boosting (GB) were employed on four various categories of datasets. Of all supervised models, decision tree machine learning technique performed most with maximum accuracy in classification of 94.03%. Classification model performance was evaluated using precision, recall, F1-score and support values wherein F1-score was most similar to the classification accuracy.


Author(s):  
Ganesh K. Shinde

Abstract: Sentiment Analysis has improvement in online shopping platforms, scientific surveys from political polls, business intelligence, etc. In this we trying to analyse the twitter posts about Hashtag like #MakeinIndia using Machine Learning approach. By doing opinion mining in a specific area, it is possible to identify the effect of area information in sentiment analysis. We put forth a feature vector for classifying the tweets as positive, negative and neutral. After that applied machine learning algorithms namely: MaxEnt and SVM. We utilised Unigram, Bigram and Trigram Features to generate a set of features to train a linear MaxEnt and SVM classifiers. In the end we have measured the performance of classifier in terms of overall accuracy. Keywords: Sentiment analysis, support vector machine, maximum entropy, N-gram, Machine Learning


2020 ◽  
Vol 3 (1) ◽  
pp. 64-74
Author(s):  
Ahmed A. Elsherif ◽  
Arwa A. Aldaej

One of the major challenges that faces the acceptance and growth rate of business and governmental sites is a Botnet-based DDoS attack. A flooding DDoS strikes a victim machine by means of sending a vast amount of malicious traffic, causing a significant drop in the service quality (QoS) in IoT devices. Nonetheless, it is not that easy to detect and tackle flooding DDoS attacks, owing to the significant number of attacking machines, the usage of source-address spoofing, and the common areas shared between legitimate and malicious traffic. New kinds of attacks are identified daily, and some remain undiscovered, accordingly, this paper aims to improve the traffic classification algorithm of network traffic, that hackers use to try to be ambiguous or misleading. A recorded simulated traffic was used for both samples; normal and DDoS attack traffic, approximately 104.000 cases of each, where both datasets -which were created for this study- represent the input data in order to create a classification model, to be used as a tool to mitigate the risk of being attacked. The next step is putting datasets in a format suitable for classification. This process is done through preprocessing techniques, to convert categorical data into numerical data. A classification process is applied to capture datasets, to create a classification model, by using five classification algorithms which are; Decision Tree, Support Vector Machine, Naive Bayes, K-Neighbours and Random Forest. The core code used for classification is the python code, which is controlled by a user interface. The highest prediction, precision and accuracy are obtained using the Decision Tree and Random Forest classification algorithms, which also have the lowest processing time.


2020 ◽  
Vol 4 (3) ◽  
pp. 650
Author(s):  
Rian Tineges ◽  
Agung Triayudi ◽  
Ira Diana Sholihati

In the year 2018, 18.9% of the population in Indonesia mentioned that the main reason for their use of the Internet is social media. One of the social media with an active user of 6.43 million users is Twitter. Based on the surge of information published via Twitter, it is possible that such information may contain the user's opinions on an object, such objects may be events around the community such as a product or service. This makes the company use Twitter as a medium to disseminate information. An example is an Internet Service Provider (ISP) such as Indihome. Through Twitter, users can discuss each other's complaints or satisfaction with Indihome's services. It takes a method of sentiment analysis to understand whether the textual data includes negative opinions or positive opinions. Thus, the authors use the Support Vector Machine (SVM) method in sentiment analysis on the opinions of the Indihome service user on Twitter, with the aim of obtaining a sentiment classification model using SVM, and to know how much accuracy the SVM method generates, which is applied to sentiment analysis, and to see how satisfied the Indihome service users are based on Twitter. After testing with SVM method The result is accuracy 87%, precision 86%, recall 95%, error rate 13%, and F1-score 90%


Sign in / Sign up

Export Citation Format

Share Document