COVIDSenti: A Large-Scale Benchmark Twitter Data Set for COVID-19 Sentiment Analysis

Social media is a combination of different platforms where a huge amount of user-generated data is collected. People from various parts of the country express their opinions, reviews, feedback and marketing strategies through social media such as Twitter, Facebook, Instagram, and YouTube. It is vital to explore, gather data, analyze them and consolidate the people views for better decision making. Sentiment analysis is a natural language processing for information extraction that identifies the user’s views. It is used for extracting reviews and opinions about the satisfaction of products, the events, and people for understanding the current trends of product or user’s behavior. The paper reviews and analyses the existing general approaches and algorithms for sentiment analysis. The proposed system selected to perform sentiment analysis on Twitter data set is Long Short Term Memory [LSTM] and evaluated with Naive Bayes Approach.

Download Full-text

An automated learning model for sentiment analysis and data classification of Twitter data using balanced CA-SVM

Concurrent Engineering ◽

10.1177/1063293x211031485 ◽

2021 ◽

pp. 1063293X2110314

Author(s):

C Pretty Diana Cyril ◽

J Rene Beulah ◽

Neelakandan Subramani ◽

Prakash Mohan ◽

A Harshavardhan ◽

...

Keyword(s):

Social Media ◽

Sentiment Analysis ◽

Modern Society ◽

Support Vector ◽

Analysis Model ◽

Data Set ◽

Twitter Data ◽

Automated Learning ◽

Sentiment Score ◽

The One

The modern society runs over the social media for their most time of every day. The web users spend their most time in social media and they share many details with their friends. Such information obtained from their chat has been used in several applications. The sentiment analysis is the one which has been applied with Twitter data set toward identifying the emotion of any user and based on those different problems can be solved. Primarily, the data as of the Twitter database is preprocessed. In this step, tokenization, stemming, stop word removal, and number removal are done. The proposed automated learning with CA-SVM based sentiment analysis model reads the Twitter data set. After that they have been processed to extract the features which yield set of terms. Using the terms, the tweets are clustered using TGS-K means clustering which measures Euclidean distance according to different features like semantic sentiment score (SSS), gazetteer and symbolic sentiment support (GSSS), and topical sentiment score (TSS). Further, the method classifies the tweets according to support vector machine (CA-SVM) which classifies the tweet according to the support value which is measured based on the above two measures. The attained results are validated utilizing k-fold cross-validation methodology. Then, the classification is performed by utilizing the Balanced CA-SVM (Deep Learning Modified Neural Network). The results are evaluated and compared with the existing works. The Proposed model achieved 92.48 % accuracy and 92.05% sentiment score contrasted with the existing works.

Download Full-text

Heterogeneous Feature Analysis on Twitter Data Set for Identification of Spam Messages

The International Arab Journal of Information Technology ◽

10.34028/iajit/19/1/5 ◽

2022 ◽

Author(s):

Valliyammai Chinnaiah ◽

Cinu C Kiliroor

Keyword(s):

Social Networks ◽

Feature Extraction ◽

Social Networking ◽

Sentiment Analysis ◽

Social Networking Sites ◽

Machine Learning Algorithms ◽

Feature Analysis ◽

Data Set ◽

Twitter Data ◽

Heterogeneous Feature

Spam is an undesirable content that present on online social networking sites, while spammers are the users who post this content on social networking sites. Unwanted messages posted on Twitter may have several goals and the spam tweets can interfere with statistics presented by Twitter mining tools and squander users’ attention.. Since Twitter has achieved a lot of attractiveness through-out the world, the interest towards it by the spammers and malevolent users is also increases. To overcome the spam problems many researchers proposed ideas using machine learning algorithms for the identification of spam messages. Not only the selection of classifiers but also the variegated feature analysis is essential for the identification of irrelevant messages in social networks. The proposed model performs a heterogeneous feature analysis on the twitter data streams for classifying the unsolicited messages using binary and continuous feature extraction with sentiment analysis on social network datasets. The features created are assessed using significant stratagems and the finest features are selected. A classifier model is built using these feature vectors to predict and identify the spam messages in Twitter. The experimental results clearly show that the proposed Sentiment Analysis based Binary and Continuous Feature Extraction model with Random Forest (SA-BC-RF) approach classifies the spam messages from the social networks with an accuracy of 90.72% when compared with the other state-of-the-art methods.

Download Full-text

ProGen:Provenance database generator for large-scale data set

Journal of Computer Applications ◽

10.3724/sp.j.1087.2008.02737 ◽

2009 ◽

Vol 28 (11) ◽

pp. 2737-2740

Author(s):

Xiao ZHANG ◽

Shan WANG ◽

Na LIAN

Keyword(s):

Large Scale ◽

Data Set ◽

Large Scale Data ◽

Scale Data

Download Full-text

A Review on Sentiment Classification: Natural Language Understanding

Recent Patents on Engineering ◽

10.2174/1872212112666180731113353 ◽

2019 ◽

Vol 13 (1) ◽

pp. 20-27 ◽

Cited By ~ 1

Author(s):

Srishty Jindal ◽

Kamlesh Sharma

Keyword(s):

Natural Language ◽

Sentiment Analysis ◽

Social Networking Sites ◽

Natural Language Understanding ◽

Business Analytics ◽

Language Understanding ◽

Text Data ◽

Data Set ◽

Market Positioning ◽

Illegal Activities

Background: With the tremendous increase in the use of social networking sites for sharing the emotions, views, preferences etc. a huge volume of data and text is available on the internet, there comes the need for understanding the text and analysing the data to determine the exact intent behind the same for a greater good. This process of understanding the text and data involves loads of analytical methods, several phases and multiple techniques. Efficient use of these techniques is important for an effective and relevant understanding of the text/data. This analysis can in turn be very helpful in ecommerce for targeting audience, social media monitoring for anticipating the foul elements from society and take proactive actions to avoid unethical and illegal activities, business analytics, market positioning etc. Method: The goal is to understand the basic steps involved in analysing the text data which can be helpful in determining sentiments behind them. This review provides detailed description of steps involved in sentiment analysis with the recent research done. Patents related to sentiment analysis and classification are reviewed to throw some light in the work done related to the field. Results: Sentiment analysis determines the polarity behind the text data/review. This analysis helps in increasing the business revenue, e-health, or determining the behaviour of a person. Conclusion: This study helps in understanding the basic steps involved in natural language understanding. At each step there are multiple techniques that can be applied on data. Different classifiers provide variable accuracy depending upon the data set and classification technique used.

Download Full-text

Sentiment Analysis using Twitter Data

International Journal for Research in Applied Science and Engineering Technology ◽

10.22214/ijraset.2020.5368 ◽

2020 ◽

Vol 8 (5) ◽

pp. 2253-2257

Author(s):

Nikhil Srivastava

Keyword(s):

Sentiment Analysis ◽

Twitter Data

Download Full-text

Integrative Data Analysis from a Unifying Research Synthesis Perspective

10.1093/oso/9780190676001.003.0020 ◽

2018 ◽

Author(s):

Eun-Young Mun ◽

Anne E. Ray

Keyword(s):

Data Analysis ◽

Large Scale ◽

Research Synthesis ◽

Alcohol Intervention ◽

Data Set ◽

Integrative Data Analysis ◽

Level Data ◽

Model Complex ◽

Wide Range ◽

Individual Participant

Integrative data analysis (IDA) is a promising new approach in psychological research and has been well received in the field of alcohol research. This chapter provides a larger unifying research synthesis framework for IDA. Major advantages of IDA of individual participant-level data include better and more flexible ways to examine subgroups, model complex relationships, deal with methodological and clinical heterogeneity, and examine infrequently occurring behaviors. However, between-study heterogeneity in measures, designs, and samples and systematic study-level missing data are significant barriers to IDA and, more broadly, to large-scale research synthesis. Based on the authors’ experience working on the Project INTEGRATE data set, which combined individual participant-level data from 24 independent college brief alcohol intervention studies, it is also recognized that IDA investigations require a wide range of expertise and considerable resources and that some minimum standards for reporting IDA studies may be needed to improve transparency and quality of evidence.

Download Full-text

Financial distress determinants among SMEs: empirical evidence from Sweden

Journal of Economic Studies ◽

10.1108/jes-01-2019-0030 ◽

2020 ◽

Vol 47 (3) ◽

pp. 547-560 ◽

Cited By ~ 1

Author(s):

Darush Yazdanfar ◽

Peter Öhman

Keyword(s):

Financial Crisis ◽

Financial Distress ◽

Large Scale ◽

Global Financial Crisis ◽

Binary Logistic Regression ◽

Data Availability ◽

Cross Sectional ◽

Data Set ◽

Content Type ◽

The Global Financial Crisis

PurposeThe purpose of this study is to empirically investigate determinants of financial distress among small and medium-sized enterprises (SMEs) during the global financial crisis and post-crisis periods.Design/methodology/approachSeveral statistical methods, including multiple binary logistic regression, were used to analyse a longitudinal cross-sectional panel data set of 3,865 Swedish SMEs operating in five industries over the 2008–2015 period.FindingsThe results suggest that financial distress is influenced by macroeconomic conditions (i.e. the global financial crisis) and, in particular, by various firm-specific characteristics (i.e. performance, financial leverage and financial distress in previous year). However, firm size and industry affiliation have no significant relationship with financial distress.Research limitationsDue to data availability, this study is limited to a sample of Swedish SMEs in five industries covering eight years. Further research could examine the generalizability of these findings by investigating other firms operating in other industries and other countries.Originality/valueThis study is the first to examine determinants of financial distress among SMEs operating in Sweden using data from a large-scale longitudinal cross-sectional database.

Download Full-text

COVIDSenti: A Large-Scale Benchmark Twitter Data Set for COVID-19 Sentiment Analysis

Sentiment analysis on Twitter Data-set using Naive Bayes algorithm

Large Scale and Parallel Sentiment Analysis Based on Label Propagation in Twitter Data

Sentiment on Twitter Data Set using Recurrent Neural Network - Long Short Term Memory

An automated learning model for sentiment analysis and data classification of Twitter data using balanced CA-SVM

Heterogeneous Feature Analysis on Twitter Data Set for Identification of Spam Messages

ProGen:Provenance database generator for large-scale data set

A Review on Sentiment Classification: Natural Language Understanding

Sentiment Analysis using Twitter Data

Integrative Data Analysis from a Unifying Research Synthesis Perspective

Financial distress determinants among SMEs: empirical evidence from Sweden

Export Citation Format