scholarly journals Hybrid Classification Technique for Sentiment Analysis of the Twitter Data

Sentiment can be described in the form of any type of approach, thought or verdict which results because of the occurrence of certain emotions. This approach is also known as opinion extraction. In this approach, emotions of different peoples with respect to meticulous rudiments are investigated. For the attainment of opinion related data, social media platforms are the best origins. Twitter may be recognized as a social media platform which is socially accessible to numerous followers. When these followers post some message on twitter, then this is recognized as tweet. The sentiment of twitter data can be analyzed with the feature extraction and classification approach. The hybrid classification is designed in this work which is the combination of KNN and random forest. The KNN classifier extract features of the dataset and random forest will classify data. The approach of hybrid classification is applied in this research work for the sentiment analysis. The performance of the proposed model is tested in terms of accuracy and execution time.

2020 ◽  
Author(s):  
Yankun Gao ◽  
Zidian Xie ◽  
Dongmei Li

BACKGROUND Previous studies have shown that electronic cigarette (e-cigarette) users might be more vulnerable to COVID-19 infection and could develop more severe symptoms if they contract the disease owing to their impaired immune responses to viral infections. Social media platforms such as Twitter have been widely used by individuals worldwide to express their responses to the current COVID-19 pandemic. OBJECTIVE In this study, we aimed to examine the longitudinal changes in the attitudes of Twitter users who used e-cigarettes toward the COVID-19 pandemic, as well as compare differences in attitudes between e-cigarette users and nonusers based on Twitter data. METHODS The study dataset containing COVID-19–related Twitter posts (tweets) posted between March 5 and April 3, 2020, was collected using a Twitter streaming application programming interface with COVID-19–related keywords. Twitter users were classified into two groups: Ecig group, including users who did not have commercial accounts but posted e-cigarette–related tweets between May 2019 and August 2019, and non-Ecig group, including users who did not post any e-cigarette–related tweets. Sentiment analysis was performed to compare sentiment scores towards the COVID-19 pandemic between both groups and determine whether the sentiment expressed was positive, negative, or neutral. Topic modeling was performed to compare the main topics discussed between the groups. RESULTS The US COVID-19 dataset consisted of 4,500,248 COVID-19–related tweets collected from 187,399 unique Twitter users in the Ecig group and 11,479,773 COVID-19–related tweets collected from 2,511,659 unique Twitter users in the non-Ecig group. Sentiment analysis showed that Ecig group users had more negative sentiment scores than non-Ecig group users. Results from topic modeling indicated that Ecig group users had more concerns about deaths due to COVID-19, whereas non-Ecig group users cared more about the government’s responses to the COVID-19 pandemic. CONCLUSIONS Our findings show that Twitter users who tweeted about e-cigarettes had more concerns about the COVID-19 pandemic. These findings can inform public health practitioners to use social media platforms such as Twitter for timely monitoring of public responses to the COVID-19 pandemic and educating and encouraging current e-cigarette users to quit vaping to minimize the risks associated with COVID-19.


Author(s):  
Subhadip Chandra ◽  
Randrita Sarkar ◽  
Sayon Islam ◽  
Soham Nandi ◽  
Avishto Banerjee ◽  
...  

Sentiment analysis is the methodical recognition, extraction, quantification, and learning of affective states and subjective information using natural language processing, text analysis, computational linguistics, and biometrics. People frequently use Twitter, one of numerous popular social media platforms, to convey their thoughts and opinions about a business, a product, or a service. Analysis of tweet sentiments is particularly useful in detecting if people have a good, negative, or neutral opinion. This study assesses public opinion about an individual, activity, commodity, or organization. The Twitter API is utilised in this article to directly get tweets from Twitter and develop a sentiment categorization for the tweets. This paper has used Twitter data for two separate approaches, viz., Lexicon & Machine Learning. Lexicon based approach further categorized in Corpus-based and Dictionary-based. And various Machine learning-based approaches like Support Vector Machine (SVM), Naïve Bayes, Maximum entropy are used to analyse Twitter data. Neural Network (NN), Decision tree-based sentiment analysis is also covered in this research work, to find out better accuracy of the approaches in the various data range. Graphs and confusion matrices are used to visualise the results of the analysis for positive, negative, and neutral remarks regarding their opinions.


2019 ◽  
Author(s):  
Xinyi Lu ◽  
Long Chen ◽  
Jianbo Yuan ◽  
Joyce Luo ◽  
Jiebo Luo ◽  
...  

BACKGROUND The number of electronic cigarette (e-cigarette) users has been increasing rapidly in recent years, especially among youth and young adults. More e-cigarette products have become available, including e-liquids with various brands and flavors. Various e-liquid flavors have been frequently discussed by e-cigarette users on social media. OBJECTIVE This study aimed to examine the longitudinal prevalence of mentions of electronic cigarette liquid (e-liquid) flavors and user perceptions on social media. METHODS We applied a data-driven approach to analyze the trends and macro-level user sentiments of different e-cigarette flavors on social media. With data collected from web-based stores, e-liquid flavors were classified into categories in a flavor hierarchy based on their ingredients. The e-cigarette–related posts were collected from social media platforms, including Reddit and Twitter, using e-cigarette–related keywords. The temporal trend of mentions of e-liquid flavor categories was compiled using Reddit data from January 2013 to April 2019. Twitter data were analyzed using a sentiment analysis from May to August 2019 to explore the opinions of e-cigarette users toward each flavor category. RESULTS More than 1000 e-liquid flavors were classified into 7 major flavor categories. The fruit and sweets categories were the 2 most frequently discussed e-liquid flavors on Reddit, contributing to approximately 58% and 15%, respectively, of all flavor-related posts. We showed that mentions of the fruit flavor category had a steady overall upward trend compared with other flavor categories that did not show much change over time. Results from the sentiment analysis demonstrated that most e-liquid flavor categories had significant positive sentiments, except for the beverage and tobacco categories. CONCLUSIONS The most updated information about the popular e-liquid flavors mentioned on social media was investigated, which showed that the prevalence of mentions of e-liquid flavors and user perceptions on social media were different. Fruit was the most frequently discussed flavor category on social media. Our study provides valuable information for future regulation of flavored e-cigarettes.


2021 ◽  
Vol 9 (1) ◽  
pp. 1315-1320
Author(s):  
Dr. Mohammed Ali Alhariri

The duplicate fake accounts are detected in this work the data from the social media platform is accessed. The platform choose to use the analysis on social media platform is selected as twitter. The twitter data is accessed using Twitter API, with using some selected features that remain the most appropriate regarding the reason of duplicate fake account. The feature based analysis is compared using machine learning techniques, Random Forest, Decision Tree, and SVM. The performance is further analyzed based on accuracy SVM performed 93.3% accuracy, where decision tree performed as 89.0% and random forest performed as 85.5%. The better performance observed using feature-based analysis is of SVM.  


2020 ◽  
Vol 25 (1) ◽  
pp. 184-192
Author(s):  
Keshav Patel ◽  
Himani Binjola ◽  
Taha Siddiqui

The role of social media during 16th Lok Sabha elections has led to several insights in the manner in which the youth of today is consuming political news. Several social media platforms have played a significant role in voting behaviour. Social media platform acted as a game changer and a catalyst in wooing young voters and influencing their opinion. While Internet users grew 7% in Urban India reaching 315 million users in 2018, digital adoption is now being propelled by rural India, registering a 35% growth in Internet users over the past year. Also, there have come a general perception that television will play a lesser role in this upcoming election and digital media will play a never before seen influence on voters. This research examines the behaviour of youth in India and examines the level of influence by social media in casting their vote in Lok Sabha elections. Whether social media can be a game changer or an influencer. These findings will be carried out by this research work.


Data ◽  
2020 ◽  
Vol 5 (1) ◽  
pp. 20
Author(s):  
Amir Haghighati ◽  
Kamran Sedig

Through social media platforms, massive amounts of data are being produced. As a microblogging social media platform, Twitter enables its users to post short updates as “tweets” on an unprecedented scale. Once analyzed using machine learning (ML) techniques and in aggregate, Twitter data can be an invaluable resource for gaining insight into different domains of discussion and public opinion. However, when applied to real-time data streams, due to covariate shifts in the data (i.e., changes in the distributions of the inputs of ML algorithms), existing ML approaches result in different types of biases and provide uncertain outputs. In this paper, we describe VARTTA (Visual Analytics for Real-Time Twitter datA), a visual analytics system that combines data visualizations, human-data interaction, and ML algorithms to help users monitor, analyze, and make sense of the streams of tweets in a real-time manner. As a case study, we demonstrate the use of VARTTA in political discussions. VARTTA not only provides users with powerful analytical tools, but also enables them to diagnose and to heuristically suggest fixes for the errors in the outcome, resulting in a more detailed understanding of the tweets. Finally, we outline several issues to be considered while designing other similar visual analytics systems.


Author(s):  
Albert Park ◽  
Mike Conway

Objective: We aim to explore how to effectively leverage social media for vaping electronic cigarette (e-cigarette) surveillance. This study examines how members of a social media platform called Reddit utilize topically-oriented sub-communities for e-cigarette discussions.Introduction: In recent years, individuals have been using social network sites like Facebook, Twitter, and Reddit to discuss health-related topics. These social media platforms consequently became new avenues for research and applications for researchers, for instance disease surveillance. Reddit, in particular, can potentially provide more in-depth contextual insights compared to Twitter, and Reddit members discuss potentially more diverse topics than Facebook members. However, identifying relevant discussions remains a challenge in large datasets like Reddit. Thus, much previous research using Reddit data focused on selected few topically-oriented sub-communities. Although such approach allows for topically focused datasets, a large portion of related data can be missed. In this research, we examine all sub-communities in which members are discussing e-cigarettes in order to determine if investigating these other sub-communities could result in a better smoking surveillance system.Methods: In this study, we use an archived Reddit dataset1 that had been used in previous studies2,3. We first preprocessed the dataset, which included converting text to lower case and removing punctuation. Due to the size of the dataset (114,320,798 posts and 1,659,361,605 associated comments from 239,772 sub-communities), we identified 4 terms to extract posts or comments about e-cigarettes via a lexicon-based approach. The terms are 'e cig', 'elec cig', and 'electronic cig'. We included any partial matches in this process to cover a variation of e-cigarette terms. For example, a partial match of ‘cig’ can cover ‘cig’, ‘cigs’, ‘cigarette’, and ‘cigarettes’. We presented the Wordcloud of the names and frequencies of sub-communities, in which members discussed e-cigarettes.Results: We extracted 354,587 posts/comments that were made by 176,252 unique member IDs from 6,039 unique sub-communities. There were 6 sub-communities with more than 8,000 e-cigarette posts. The sub-communities are ‘AskReddit’ (59,939) ‘Cigars’ (51,684) ‘electronic_cigarette’ (24,393), ‘trees’ (17,752), ‘pics’ (8,792), ‘stopsmoking’ (8,589). Other notable sub-communities are ‘news’ (5,010), ‘politics’ (4,662), ‘worldnews’ (3,785), ‘science’ (3,279), ‘Drugs’ (2,967), ‘PipeTobacco’ (2,099), ‘Cigarettes’ (1,401), ‘teenagers’ (1,016), ‘AskMen’ (918), ‘Marijuana’ (826), ‘Fitness’ (818), ‘AskWomen’ (698), ‘cubancigars’ (695), and ‘vaporents’ (608). Members were participating not only in sub-communities related to smoking and smoking cessation, but also in science, news, health, teenager, and Q&A sub-communities. The overview of the sub-communities that members participated to discuss e-cigarette are summarized in Figure 1.Conclusions: We present preliminary findings concerning the various sub-communities in which members had discussion on e-cigarettes in the popular social media platform Reddit. Our initial results suggest that Reddit members openly discuss electronic cigarette-related issues in many sub-communities that are unrelated to smoking. For the purpose of e-cigarettes surveillance, understanding the discussions in unrelated sub-communities, for example the subreddit ‘teenagers’, can provide opportunities to gain an in-depth perspective on the increased use of e-cigarettes by youth or non-smoker4. Moreover, high levels of activities in Q&A sub-communities like ‘AskReddit’, ‘AskMen’, and ‘AskWomen’ could indicate ineffective information dissemination regarding e-cigarettes5, warranting further investigation. For the purpose of disease surveillance, we conclude that understanding the discussion in unrelated sub-communities has the potential to improve the practice of public health surveillance.


From the last few years, researchers are very much attracted to sentiment analysis, especially towards hate speech detectionsystems. As in different languages procreation of hate speech has compelling and symbolic consideration on social media. Hate speech has a great impact on society, using hate words harms others dignity. Hate speech detectionsystems areimportant to stop the transformation of hate words into crimes. In this research,a frameworkis developedfor hate speech detectionsystemin the Pashto language. A datasetis created for which data is collected from Twitter. Because there is no related data available. Most of the research work has been done in this domain for other languages, and it’s very maturein the context of detecting hate speech. But when it arrives at the morphological languages not much work has been done especially in the Pashto language. This researchaimed and collected data from Twitter, Tweets related to ethnicity and religion. The data collected from twitter has been annotated manually and categorized the data as hate or not by comparing it with the offensive content. For hate speechdetection systemsto view the impact of different features/attribute this study performed experiments on the existing classifiers i.e.,SVM, Naïve Bayes, Decision tree and KNN. SVM produced the highest result at dataset of 500 i.e.,74% among all the classifiers. KNN and Decision Tree produced same result at dataset of 1500 i.e.,65.0%. Dataset of 2800 Decision Tree produced the highest result i.e.,72% and SVM produced 71.9%.


10.2196/17280 ◽  
2020 ◽  
Vol 22 (6) ◽  
pp. e17280 ◽  
Author(s):  
Xinyi Lu ◽  
Long Chen ◽  
Jianbo Yuan ◽  
Joyce Luo ◽  
Jiebo Luo ◽  
...  

Background The number of electronic cigarette (e-cigarette) users has been increasing rapidly in recent years, especially among youth and young adults. More e-cigarette products have become available, including e-liquids with various brands and flavors. Various e-liquid flavors have been frequently discussed by e-cigarette users on social media. Objective This study aimed to examine the longitudinal prevalence of mentions of electronic cigarette liquid (e-liquid) flavors and user perceptions on social media. Methods We applied a data-driven approach to analyze the trends and macro-level user sentiments of different e-cigarette flavors on social media. With data collected from web-based stores, e-liquid flavors were classified into categories in a flavor hierarchy based on their ingredients. The e-cigarette–related posts were collected from social media platforms, including Reddit and Twitter, using e-cigarette–related keywords. The temporal trend of mentions of e-liquid flavor categories was compiled using Reddit data from January 2013 to April 2019. Twitter data were analyzed using a sentiment analysis from May to August 2019 to explore the opinions of e-cigarette users toward each flavor category. Results More than 1000 e-liquid flavors were classified into 7 major flavor categories. The fruit and sweets categories were the 2 most frequently discussed e-liquid flavors on Reddit, contributing to approximately 58% and 15%, respectively, of all flavor-related posts. We showed that mentions of the fruit flavor category had a steady overall upward trend compared with other flavor categories that did not show much change over time. Results from the sentiment analysis demonstrated that most e-liquid flavor categories had significant positive sentiments, except for the beverage and tobacco categories. Conclusions The most updated information about the popular e-liquid flavors mentioned on social media was investigated, which showed that the prevalence of mentions of e-liquid flavors and user perceptions on social media were different. Fruit was the most frequently discussed flavor category on social media. Our study provides valuable information for future regulation of flavored e-cigarettes.


2019 ◽  
Vol 8 (4) ◽  
pp. 9727-9732

With the growth of technology there is lot of data available on the internet. Social media platform like Twitter, FaceBook,Google+,whats app,instagram etc are the platform that allow people to share and express their views, ideas, thoughts and experiences about any topics, post messages across the world. There are mainly two types of textual information available on social media platforms. One is fact and another next one is sentiments or more formally it can also called opinion. The social media is a platform where people gives their opinion regularly. These opinions may contain some factual information. For the analysis of sentiments we required some tools. Mostly text based mining is used for opinion mining. Text mining required lots of different tools and research work. This paper, provides a machine learning techniques for opinion calculation in Twitter..


Sign in / Sign up

Export Citation Format

Share Document