An automated learning model for sentiment analysis and data classification of Twitter data using balanced CA-SVM

The modern society runs over the social media for their most time of every day. The web users spend their most time in social media and they share many details with their friends. Such information obtained from their chat has been used in several applications. The sentiment analysis is the one which has been applied with Twitter data set toward identifying the emotion of any user and based on those different problems can be solved. Primarily, the data as of the Twitter database is preprocessed. In this step, tokenization, stemming, stop word removal, and number removal are done. The proposed automated learning with CA-SVM based sentiment analysis model reads the Twitter data set. After that they have been processed to extract the features which yield set of terms. Using the terms, the tweets are clustered using TGS-K means clustering which measures Euclidean distance according to different features like semantic sentiment score (SSS), gazetteer and symbolic sentiment support (GSSS), and topical sentiment score (TSS). Further, the method classifies the tweets according to support vector machine (CA-SVM) which classifies the tweet according to the support value which is measured based on the above two measures. The attained results are validated utilizing k-fold cross-validation methodology. Then, the classification is performed by utilizing the Balanced CA-SVM (Deep Learning Modified Neural Network). The results are evaluated and compared with the existing works. The Proposed model achieved 92.48 % accuracy and 92.05% sentiment score contrasted with the existing works.

Download Full-text

A sentiment analysis system for social media using machine learning techniques: Social enablement

Digital Scholarship in the Humanities ◽

10.1093/llc/fqy037 ◽

2018 ◽

Vol 34 (3) ◽

pp. 569-581 ◽

Cited By ~ 1

Author(s):

Sujata Rani ◽

Parteek Kumar

Keyword(s):

Machine Learning ◽

Social Media ◽

Sentiment Analysis ◽

Media Analysis ◽

Training Data ◽

Machine Learning Techniques ◽

Support Vector ◽

Analysis Tool ◽

Data Set ◽

Learning Techniques

Abstract In this article, an innovative approach to perform the sentiment analysis (SA) has been presented. The proposed system handles the issues of Romanized or abbreviated text and spelling variations in the text to perform the sentiment analysis. The training data set of 3,000 movie reviews and tweets has been manually labeled by native speakers of Hindi in three classes, i.e. positive, negative, and neutral. The system uses WEKA (Waikato Environment for Knowledge Analysis) tool to convert these string data into numerical matrices and applies three machine learning techniques, i.e. Naive Bayes (NB), J48, and support vector machine (SVM). The proposed system has been tested on 100 movie reviews and tweets, and it has been observed that SVM has performed best in comparison to other classifiers, and it has an accuracy of 68% for movie reviews and 82% in case of tweets. The results of the proposed system are very promising and can be used in emerging applications like SA of product reviews and social media analysis. Additionally, the proposed system can be used in other cultural/social benefits like predicting/fighting human riots.

Download Full-text

PUBLIC’S SENTIMENT ANALYSIS ON SHOPEE-FOOD SERVICE USING LEXICON-BASED AND SUPPORT VECTOR MACHINE

Jurnal Riset Informatika ◽

10.34288/jri.v4i1.287 ◽

2021 ◽

Vol 4 (1) ◽

pp. 1-8

Author(s):

Shafira Shalehanny ◽

Agung Triayudi ◽

Endah Tri Esti Handayani

Keyword(s):

Social Media ◽

Support Vector Machine ◽

Sentiment Analysis ◽

Food Service ◽

Support Vector ◽

Accuracy Score ◽

Online Business ◽

Data Source ◽

Sentiment Score ◽

Processing Language

Technology field following how era keep evolving. Social media already on everyone’s daily life and being a place for writing their opinion, either review or response for product and service that already being used. Twitter are one of popular social media on Indonesia, according to Statista data it reach 17.55 million users. For online business sector, knowing sentiment score are really important to stepping up their business. The use of machine learning, NLP (Natural Processing Language), and text mining for knowing the real meaning of opinion words given by customer called sentiment analysis. Two methods are using for data testing, the first is Lexicon Based and the second is Support Vector Machine (SVM). Data source that used for sentiment analyst are from keyword ‘ShopeeFood’ and ‘syopifud’. The result of analysis giving accuracy score 87%, precision score 81%, recall score 75%, and f1-score 78%.

Download Full-text

Sentiment on Twitter Data Set using Recurrent Neural Network - Long Short Term Memory

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.k1244.09811s19 ◽

2019 ◽

Vol 8 (11S) ◽

pp. 1206-1211

Keyword(s):

Social Media ◽

Sentiment Analysis ◽

Language Processing ◽

Short Term Memory ◽

Short Term ◽

Data Set ◽

Term Memory ◽

Twitter Data ◽

The People ◽

Long Short Term Memory

Social media is a combination of different platforms where a huge amount of user-generated data is collected. People from various parts of the country express their opinions, reviews, feedback and marketing strategies through social media such as Twitter, Facebook, Instagram, and YouTube. It is vital to explore, gather data, analyze them and consolidate the people views for better decision making. Sentiment analysis is a natural language processing for information extraction that identifies the user’s views. It is used for extracting reviews and opinions about the satisfaction of products, the events, and people for understanding the current trends of product or user’s behavior. The paper reviews and analyses the existing general approaches and algorithms for sentiment analysis. The proposed system selected to perform sentiment analysis on Twitter data set is Long Short Term Memory [LSTM] and evaluated with Naive Bayes Approach.

Download Full-text

Using Twitter data to predict the performance of Bollywood movies

Industrial Management & Data Systems ◽

10.1108/imds-04-2015-0145 ◽

2015 ◽

Vol 115 (9) ◽

pp. 1604-1621 ◽

Cited By ~ 26

Author(s):

Dipak Damodar Gaikar ◽

Bijith Marakarkandy ◽

Chandan Dasgupta

Keyword(s):

Social Media ◽

Sentiment Analysis ◽

Classification Model ◽

Data Mining Algorithm ◽

Content Type ◽

Box Office ◽

Twitter Data ◽

Interface System ◽

Sentiment Score ◽

Movie Performance

Purpose – The purpose of this paper is to address the shortcomings of limited research in forecasting the power of social media in India. Design/methodology/approach – This paper uses sentiment analysis and prediction algorithms to analyze the performance of Indian movies based on data obtained from social media sites. The authors used Twitter4j Java API for extracting the tweets through authenticating connection with Twitter web sites and stored the extracted data in MySQL database and used the data for sentiment analysis. To perform sentiment analysis of Twitter data, the Probabilistic Latent Semantic Analysis classification model is used to find the sentiment score in the form of positive, negative and neutral. The data mining algorithm Fuzzy Inference System is used to implement sentiment analysis and predict movie performance that is classified into three categories: hit, flop and average. Findings – In this study the authors found results of movie performance at the box office, which had been based on fuzzy interface system algorithm for prediction. The fuzzy interface system contains two factors, namely, sentiment score and actor rating to get the accurate result. By calculation of opening weekend collection, the authors found that that the predicted values were approximately same as the actual values. For the movie Singham Returns over method of prediction gave a box office collection as 84 crores and the actual collection turned out to be 88 crores. Research limitations/implications – The current study suffers from the limitation of not having enough computing resources to crawl the data. For predicting box office collection, there is no correct availability of ticket price information, total number of seats per screen and total number of shows per day on all screens. In the future work the authors can add several other inputs like budget of movie, Central Board of Film Certification rating, movie genre, target audience that will improve the accuracy and quality of the prediction. Originality/value – The authors used different factors for predicting box office movie performance which had not been used in previous literature. This work is valuable for promoting of product and services of the firms.

Download Full-text

Automatic Sentiment Analysis Model Creation using Multi Kernel Improved Extreme Learning Machine

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.e6842.038620 ◽

2020 ◽

Vol 8 (6) ◽

pp. 2727-2735

Keyword(s):

Social Media ◽

Sentiment Analysis ◽

Extreme Learning Machine ◽

Opinion Mining ◽

Support Vector ◽

Analysis Model ◽

Feature Sets ◽

Research Activities ◽

Kernel Extreme Learning Machine ◽

Learning Machine

Recent research activities related to opinion mining, sentiment analysis and emotion detection from natural language texts are all under the umbrella of affective computation. There is now a huge amount of textual information on social media (for example, forums, blogs, and social media) about consumers' ideas about buying products and service experiences. Sentiment analysis or opinion mining is part of an investigation that analyzes people's thoughts and feelings from written text available online. In this paper, this work present a comprehensive experiment to evaluate the effectiveness of psychological and linguistic features in emotion classification. In this scheme, we used five broad categories of LIWC (namely, psychological processes, linguistic processes, punctuation, spoken categories and personal concerns) as feature sets. Five types of LIWCs and their group combinations were considered in the experimental analysis. To understand the predictive performance of various aspects of the engineering scheme, five controlled learning algorithms (namely, Naïve Bayes, support vector machines, Extreme Learning Machine, Kernel Extreme Learning Machine, Multi Kernel Extreme Learning Machine) and proposed Multi Kernel Improved Extreme Learning Machine (MKIELM) are used. Experimental results show that the ensemble feature sets provides a higher predictive effect than the individual set..

Download Full-text

Sentiment analysis on newspaper article reviews: contribution towards improved rider optimization-based hybrid classifier

Kybernetes ◽

10.1108/k-08-2020-0512 ◽

2021 ◽

Vol ahead-of-print (ahead-of-print) ◽

Author(s):

Pandiaraj A. ◽

Sundar C. ◽

Pavalarajan S.

Keyword(s):

Feature Extraction ◽

Sentiment Analysis ◽

Optimization Algorithm ◽

Support Vector ◽

Analysis Model ◽

Data Set ◽

Hybrid Classifier ◽

Content Type ◽

Whale Optimization ◽

Newspaper Articles

Purpose Up to date development in sentiment analysis has resulted in a symbolic growth in the volume of study, especially on more subjective text types, namely, product or movie reviews. The key difference between these texts with news articles is that their target is defined and unique across the text. Hence, the reviews on newspaper articles can deal with three subtasks: correctly spotting the target, splitting the good and bad content from the reviews on the concerned target and evaluating different opinions provided in a detailed manner. On defining these tasks, this paper aims to implement a new sentiment analysis model for article reviews from the newspaper. Design/methodology/approach Here, tweets from various newspaper articles are taken and the sentiment analysis process is done with pre-processing, semantic word extraction, feature extraction and classification. Initially, the pre-processing phase is performed, in which different steps such as stop word removal, stemming, blank space removal are carried out and it results in producing the keywords that speak about positive, negative or neutral. Further, semantic words (similar) are extracted from the available dictionary by matching the keywords. Next, the feature extraction is done for the extracted keywords and semantic words using holoentropy to attain information statistics, which results in the attainment of maximum related information. Here, two categories of holoentropy features are extracted: joint holoentropy and cross holoentropy. These extracted features of entire keywords are finally subjected to a hybrid classifier, which merges the beneficial concepts of neural network (NN), and deep belief network (DBN). For improving the performance of sentiment classification, modification is done by inducing the idea of a modified rider optimization algorithm (ROA), so-called new steering updated ROA (NSU-ROA) into NN and DBN for weight update. Hence, the average of both improved classifiers will provide the classified sentiment as positive, negative or neutral from the reviews of newspaper articles effectively. Findings Three data sets were considered for experimentation. The results have shown that the developed NSU-ROA + DBN + NN attained high accuracy, which was 2.6% superior to particle swarm optimization, 3% superior to FireFly, 3.8% superior to grey wolf optimization, 5.5% superior to whale optimization algorithm and 3.2% superior to ROA-based DBN + NN from data set 1. The classification analysis has shown that the accuracy of the proposed NSU − DBN + NN was 3.4% enhanced than DBN + NN, 25% enhanced than DBN and 28.5% enhanced than NN and 32.3% enhanced than support vector machine from data set 2. Thus, the effective performance of the proposed NSU − ROA + DBN + NN on sentiment analysis of newspaper articles has been proved. Originality/value This paper adopts the latest optimization algorithm called the NSU-ROA to effectively recognize the sentiments of the newspapers with NN and DBN. This is the first work that uses NSU-ROA-based optimization for accurate identification of sentiments from newspaper articles.

Download Full-text

COVIDSenti: A Large-Scale Benchmark Twitter Data Set for COVID-19 Sentiment Analysis

IEEE Transactions on Computational Social Systems ◽

10.1109/tcss.2021.3051189 ◽

2021 ◽

pp. 1-13

Author(s):

Usman Naseem ◽

Imran Razzak ◽

Matloob Khushi ◽

Peter W. Eklund ◽

Jinman Kim

Keyword(s):

Sentiment Analysis ◽

Large Scale ◽

Data Set ◽

Twitter Data

Download Full-text

Sentiment analysis on Twitter Data-set using Naive Bayes algorithm

2016 2nd International Conference on Applied and Theoretical Computing and Communication Technology (iCATccT) ◽

10.1109/icatcct.2016.7912034 ◽

2016 ◽

Cited By ~ 20

Author(s):

Huma Parveen ◽

Shikha Pandey

Keyword(s):

Sentiment Analysis ◽

Naive Bayes ◽

Naïve Bayes ◽

Data Set ◽

Twitter Data ◽

Bayes Algorithm

Download Full-text

Crowdsourcing Incident Information for Emergency Response using Open Data Sources in Smart Cities

Transportation Research Record Journal of the Transportation Research Board ◽

10.1177/0361198118798736 ◽

2018 ◽

Vol 2672 (1) ◽

pp. 198-208 ◽

Cited By ~ 3

Author(s):

Fan Zuo ◽

Abdullah Kurkcu ◽

Kaan Ozbay ◽

Jingqin Gao

Keyword(s):

Social Media ◽

Prior Knowledge ◽

Emergency Response ◽

Large Scale ◽

Latent Dirichlet Allocation ◽

Smart Cities ◽

Hurricane Sandy ◽

Support Vector ◽

Twitter Data ◽

Emergency Events

Emergency events affect human security and safety as well as the integrity of the local infrastructure. Emergency response officials are required to make decisions using limited information and time. During emergency events, people post updates to social media networks, such as tweets, containing information about their status, help requests, incident reports, and other useful information. In this research project, the Latent Dirichlet Allocation (LDA) model is used to automatically classify incident-related tweets and incident types using Twitter data. Unlike the previous social media information models proposed in the related literature, the LDA is an unsupervised learning model which can be utilized directly without prior knowledge and preparation for data in order to save time during emergencies. Twitter data including messages and geolocation information during two recent events in New York City, the Chelsea explosion and Hurricane Sandy, are used as two case studies to test the accuracy of the LDA model for extracting incident-related tweets and labeling them by incident type. Results showed that the model could extract emergency events and classify them for both small and large-scale events, and the model’s hyper-parameters can be shared in a similar language environment to save model training time. Furthermore, the list of keywords generated by the model can be used as prior knowledge for emergency event classification and training of supervised classification models such as support vector machine and recurrent neural network.

Download Full-text

Study of Automotive Brands Popularity in Indonesia Using Twitter Data

Journal of Applied Information, Communication and Technology ◽

10.33555/ejaict.v3i1.91 ◽

2016 ◽

Vol 3 (1) ◽

pp. 23-33

Author(s):

Stevent Efendi ◽

Alva Erwin ◽

Kho I Eng

Keyword(s):

Social Media ◽

Social Network ◽

Sentiment Analysis ◽

Automotive Industry ◽

Real World ◽

The Internet ◽

Brand Preference ◽

Twitter Data ◽

Wide Range ◽

Widespread Phenomenon

Social media has been a widespread phenomenon in the recent years. People shared a lot of thought in social media, and these data posted on the internet could be used for study and researches. As one of the fastest growing social network, Twitter is a particularly popular social media to be studied because it allows researchers to access their data. This research will look the correlation between Twitter chatter of a brand and the sales of brands in Indonesia. Factors such as sentiment and tweet rate are expected to be able to predict the popularity of a brand. Being one of the biggest industries in Indonesia, automotive industry is an interesting subject to study. A wide range of people buys vehicles, and even gather as communities based on their car or motorcycle brand preference. The Twitter results of sentiment analysis and tweet rate will be compared with real world sales results published by GAIKINDO and AISI.

Download Full-text