An automated learning model for sentiment analysis and data classification of Twitter data using balanced CA-SVM

2021 ◽  
pp. 1063293X2110314
Author(s):  
C Pretty Diana Cyril ◽  
J Rene Beulah ◽  
Neelakandan Subramani ◽  
Prakash Mohan ◽  
A Harshavardhan ◽  
...  

The modern society runs over the social media for their most time of every day. The web users spend their most time in social media and they share many details with their friends. Such information obtained from their chat has been used in several applications. The sentiment analysis is the one which has been applied with Twitter data set toward identifying the emotion of any user and based on those different problems can be solved. Primarily, the data as of the Twitter database is preprocessed. In this step, tokenization, stemming, stop word removal, and number removal are done. The proposed automated learning with CA-SVM based sentiment analysis model reads the Twitter data set. After that they have been processed to extract the features which yield set of terms. Using the terms, the tweets are clustered using TGS-K means clustering which measures Euclidean distance according to different features like semantic sentiment score (SSS), gazetteer and symbolic sentiment support (GSSS), and topical sentiment score (TSS). Further, the method classifies the tweets according to support vector machine (CA-SVM) which classifies the tweet according to the support value which is measured based on the above two measures. The attained results are validated utilizing k-fold cross-validation methodology. Then, the classification is performed by utilizing the Balanced CA-SVM (Deep Learning Modified Neural Network). The results are evaluated and compared with the existing works. The Proposed model achieved 92.48 % accuracy and 92.05% sentiment score contrasted with the existing works.

2018 ◽  
Vol 34 (3) ◽  
pp. 569-581 ◽  
Author(s):  
Sujata Rani ◽  
Parteek Kumar

Abstract In this article, an innovative approach to perform the sentiment analysis (SA) has been presented. The proposed system handles the issues of Romanized or abbreviated text and spelling variations in the text to perform the sentiment analysis. The training data set of 3,000 movie reviews and tweets has been manually labeled by native speakers of Hindi in three classes, i.e. positive, negative, and neutral. The system uses WEKA (Waikato Environment for Knowledge Analysis) tool to convert these string data into numerical matrices and applies three machine learning techniques, i.e. Naive Bayes (NB), J48, and support vector machine (SVM). The proposed system has been tested on 100 movie reviews and tweets, and it has been observed that SVM has performed best in comparison to other classifiers, and it has an accuracy of 68% for movie reviews and 82% in case of tweets. The results of the proposed system are very promising and can be used in emerging applications like SA of product reviews and social media analysis. Additionally, the proposed system can be used in other cultural/social benefits like predicting/fighting human riots.


2021 ◽  
Vol 4 (1) ◽  
pp. 1-8
Author(s):  
Shafira Shalehanny ◽  
Agung Triayudi ◽  
Endah Tri Esti Handayani

Technology field following how era keep evolving. Social media already on everyone’s daily life and being a place for writing their opinion, either review or response for product and service that already being used. Twitter are one of popular social media on Indonesia, according to Statista data it reach 17.55 million users. For online business sector, knowing sentiment score are really important to stepping up their business. The use of machine learning, NLP (Natural Processing Language), and text mining for knowing the real meaning of opinion words given by customer called sentiment analysis. Two methods are using for data testing, the first is Lexicon Based and the second is Support Vector Machine (SVM). Data source that used for sentiment analyst are from keyword ‘ShopeeFood’ and ‘syopifud’. The result of analysis giving accuracy score 87%, precision score 81%, recall score 75%, and f1-score 78%.


Social media is a combination of different platforms where a huge amount of user-generated data is collected. People from various parts of the country express their opinions, reviews, feedback and marketing strategies through social media such as Twitter, Facebook, Instagram, and YouTube. It is vital to explore, gather data, analyze them and consolidate the people views for better decision making. Sentiment analysis is a natural language processing for information extraction that identifies the user’s views. It is used for extracting reviews and opinions about the satisfaction of products, the events, and people for understanding the current trends of product or user’s behavior. The paper reviews and analyses the existing general approaches and algorithms for sentiment analysis. The proposed system selected to perform sentiment analysis on Twitter data set is Long Short Term Memory [LSTM] and evaluated with Naive Bayes Approach.


2015 ◽  
Vol 115 (9) ◽  
pp. 1604-1621 ◽  
Author(s):  
Dipak Damodar Gaikar ◽  
Bijith Marakarkandy ◽  
Chandan Dasgupta

Purpose – The purpose of this paper is to address the shortcomings of limited research in forecasting the power of social media in India. Design/methodology/approach – This paper uses sentiment analysis and prediction algorithms to analyze the performance of Indian movies based on data obtained from social media sites. The authors used Twitter4j Java API for extracting the tweets through authenticating connection with Twitter web sites and stored the extracted data in MySQL database and used the data for sentiment analysis. To perform sentiment analysis of Twitter data, the Probabilistic Latent Semantic Analysis classification model is used to find the sentiment score in the form of positive, negative and neutral. The data mining algorithm Fuzzy Inference System is used to implement sentiment analysis and predict movie performance that is classified into three categories: hit, flop and average. Findings – In this study the authors found results of movie performance at the box office, which had been based on fuzzy interface system algorithm for prediction. The fuzzy interface system contains two factors, namely, sentiment score and actor rating to get the accurate result. By calculation of opening weekend collection, the authors found that that the predicted values were approximately same as the actual values. For the movie Singham Returns over method of prediction gave a box office collection as 84 crores and the actual collection turned out to be 88 crores. Research limitations/implications – The current study suffers from the limitation of not having enough computing resources to crawl the data. For predicting box office collection, there is no correct availability of ticket price information, total number of seats per screen and total number of shows per day on all screens. In the future work the authors can add several other inputs like budget of movie, Central Board of Film Certification rating, movie genre, target audience that will improve the accuracy and quality of the prediction. Originality/value – The authors used different factors for predicting box office movie performance which had not been used in previous literature. This work is valuable for promoting of product and services of the firms.


2020 ◽  
Vol 8 (6) ◽  
pp. 2727-2735

Recent research activities related to opinion mining, sentiment analysis and emotion detection from natural language texts are all under the umbrella of affective computation. There is now a huge amount of textual information on social media (for example, forums, blogs, and social media) about consumers' ideas about buying products and service experiences. Sentiment analysis or opinion mining is part of an investigation that analyzes people's thoughts and feelings from written text available online. In this paper, this work present a comprehensive experiment to evaluate the effectiveness of psychological and linguistic features in emotion classification. In this scheme, we used five broad categories of LIWC (namely, psychological processes, linguistic processes, punctuation, spoken categories and personal concerns) as feature sets. Five types of LIWCs and their group combinations were considered in the experimental analysis. To understand the predictive performance of various aspects of the engineering scheme, five controlled learning algorithms (namely, Naïve Bayes, support vector machines, Extreme Learning Machine, Kernel Extreme Learning Machine, Multi Kernel Extreme Learning Machine) and proposed Multi Kernel Improved Extreme Learning Machine (MKIELM) are used. Experimental results show that the ensemble feature sets provides a higher predictive effect than the individual set..


Kybernetes ◽  
2021 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Author(s):  
Pandiaraj A. ◽  
Sundar C. ◽  
Pavalarajan S.

Purpose Up to date development in sentiment analysis has resulted in a symbolic growth in the volume of study, especially on more subjective text types, namely, product or movie reviews. The key difference between these texts with news articles is that their target is defined and unique across the text. Hence, the reviews on newspaper articles can deal with three subtasks: correctly spotting the target, splitting the good and bad content from the reviews on the concerned target and evaluating different opinions provided in a detailed manner. On defining these tasks, this paper aims to implement a new sentiment analysis model for article reviews from the newspaper. Design/methodology/approach Here, tweets from various newspaper articles are taken and the sentiment analysis process is done with pre-processing, semantic word extraction, feature extraction and classification. Initially, the pre-processing phase is performed, in which different steps such as stop word removal, stemming, blank space removal are carried out and it results in producing the keywords that speak about positive, negative or neutral. Further, semantic words (similar) are extracted from the available dictionary by matching the keywords. Next, the feature extraction is done for the extracted keywords and semantic words using holoentropy to attain information statistics, which results in the attainment of maximum related information. Here, two categories of holoentropy features are extracted: joint holoentropy and cross holoentropy. These extracted features of entire keywords are finally subjected to a hybrid classifier, which merges the beneficial concepts of neural network (NN), and deep belief network (DBN). For improving the performance of sentiment classification, modification is done by inducing the idea of a modified rider optimization algorithm (ROA), so-called new steering updated ROA (NSU-ROA) into NN and DBN for weight update. Hence, the average of both improved classifiers will provide the classified sentiment as positive, negative or neutral from the reviews of newspaper articles effectively. Findings Three data sets were considered for experimentation. The results have shown that the developed NSU-ROA + DBN + NN attained high accuracy, which was 2.6% superior to particle swarm optimization, 3% superior to FireFly, 3.8% superior to grey wolf optimization, 5.5% superior to whale optimization algorithm and 3.2% superior to ROA-based DBN + NN from data set 1. The classification analysis has shown that the accuracy of the proposed NSU − DBN + NN was 3.4% enhanced than DBN + NN, 25% enhanced than DBN and 28.5% enhanced than NN and 32.3% enhanced than support vector machine from data set 2. Thus, the effective performance of the proposed NSU − ROA + DBN + NN on sentiment analysis of newspaper articles has been proved. Originality/value This paper adopts the latest optimization algorithm called the NSU-ROA to effectively recognize the sentiments of the newspapers with NN and DBN. This is the first work that uses NSU-ROA-based optimization for accurate identification of sentiments from newspaper articles.


Author(s):  
Usman Naseem ◽  
Imran Razzak ◽  
Matloob Khushi ◽  
Peter W. Eklund ◽  
Jinman Kim

Author(s):  
Fan Zuo ◽  
Abdullah Kurkcu ◽  
Kaan Ozbay ◽  
Jingqin Gao

Emergency events affect human security and safety as well as the integrity of the local infrastructure. Emergency response officials are required to make decisions using limited information and time. During emergency events, people post updates to social media networks, such as tweets, containing information about their status, help requests, incident reports, and other useful information. In this research project, the Latent Dirichlet Allocation (LDA) model is used to automatically classify incident-related tweets and incident types using Twitter data. Unlike the previous social media information models proposed in the related literature, the LDA is an unsupervised learning model which can be utilized directly without prior knowledge and preparation for data in order to save time during emergencies. Twitter data including messages and geolocation information during two recent events in New York City, the Chelsea explosion and Hurricane Sandy, are used as two case studies to test the accuracy of the LDA model for extracting incident-related tweets and labeling them by incident type. Results showed that the model could extract emergency events and classify them for both small and large-scale events, and the model’s hyper-parameters can be shared in a similar language environment to save model training time. Furthermore, the list of keywords generated by the model can be used as prior knowledge for emergency event classification and training of supervised classification models such as support vector machine and recurrent neural network.


2016 ◽  
Vol 3 (1) ◽  
pp. 23-33
Author(s):  
Stevent Efendi ◽  
Alva Erwin ◽  
Kho I Eng

Social media has been a widespread phenomenon in the recent years. People shared a lot of thought in social media, and these data posted on the internet could be used for study and researches. As one of the fastest growing social network, Twitter is a particularly popular social media to be studied because it allows researchers to access their data. This research will look the correlation between Twitter chatter of a brand and the sales of brands in Indonesia. Factors such as sentiment and tweet rate are expected to be able to predict the popularity of a brand. Being one of the biggest industries in Indonesia, automotive industry is an interesting subject to study. A wide range of people buys vehicles, and even gather as communities based on their car or motorcycle brand preference. The Twitter results of sentiment analysis and tweet rate will be compared with real world sales results published by GAIKINDO and AISI.


Sign in / Sign up

Export Citation Format

Share Document