Ensemble Classifiers for Arabic Sentiment Analysis of Social Network (Twitter Data) towards COVID-19-Related Conspiracy Theories

Sentiment analysis has recently become increasingly important with a massive increase in online content. It is associated with the analysis of textual data generated by social media that can be easily accessed, obtained, and analyzed. With the emergence of COVID-19, most published studies related to COVID-19’s conspiracy theories were surveys on the people's sentiments and opinions and studied the impact of the pandemic on their lives. Just a few studies utilized sentiment analysis of social media using a machine learning approach. These studies focused more on sentiment analysis of Twitter tweets in the English language and did not pay more attention to other languages such as Arabic. This study proposes a machine learning model to analyze the Arabic tweets from Twitter. In this model, we apply Word2Vec for word embedding which formed the main source of features. Two pretrained continuous bag-of-words (CBOW) models are investigated, and Naïve Bayes was used as a baseline classifier. Several single-based and ensemble-based machine learning classifiers have been used with and without SMOTE (synthetic minority oversampling technique). The experimental results show that applying word embedding with an ensemble and SMOTE achieved good improvement on average of F1 score compared to the baseline classifier and other classifiers (single-based and ensemble-based) without SMOTE.

Download Full-text

Fuzzy based feature engineering architecture for sentiment analysis of medical discussion over online social networks

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-202874 ◽

2021 ◽

pp. 1-13

Author(s):

C S Pavan Kumar ◽

L D Dhinesh Babu

Keyword(s):

Machine Learning ◽

Social Networks ◽

Social Media ◽

Sentiment Analysis ◽

Membership Function ◽

Online Social Networks ◽

Learning Model ◽

Feature Engineering ◽

Machine Learning Model ◽

Social Media Platforms

Sentiment analysis is widely used to retrieve the hidden sentiments in medical discussions over Online Social Networking platforms such as Twitter, Facebook, Instagram. People often tend to convey their feelings concerning their medical problems over social media platforms. Practitioners and health care workers have started to observe these discussions to assess the impact of health-related issues among the people. This helps in providing better care to improve the quality of life. Dementia is a serious disease in western countries like the United States of America and the United Kingdom, and the respective governments are providing facilities to the affected people. There is much chatter over social media platforms concerning the patients’ care, healthy measures to be followed to avoid disease, check early indications. These chatters have to be carefully monitored to help the officials take necessary precautions for the betterment of the affected. A novel Feature engineering architecture that involves feature-split for sentiment analysis of medical chatter over online social networks with the pipeline is proposed that can be used on any Machine Learning model. The proposed model used the fuzzy membership function in refining the outputs. The machine learning model has obtained sentiment score is subjected to fuzzification and defuzzification by using the trapezoid membership function and center of sums method, respectively. Three datasets are considered for comparison of the proposed and the regular model. The proposed approach delivered better results than the normal approach and is proved to be an effective approach for sentiment analysis of medical discussions over online social networks.

Download Full-text

The Sentiment Analysis Reviewing Indosat Services from Twitter Using the Naive Bayes Classifier

Journal of Applied Computer Science and Technology ◽

10.52158/jacost.v1i2.79 ◽

2020 ◽

Vol 1 (2) ◽

pp. 61-66

Author(s):

Febri Astiko ◽

Achmad Khodar

Keyword(s):

Machine Learning ◽

Social Media ◽

Sentiment Analysis ◽

Naive Bayes ◽

Learning Model ◽

Naïve Bayes ◽

Bayes Classifier ◽

Naïve Bayes Classifier ◽

Machine Learning Model ◽

Bayes Algorithm

This study aims to design a machine learning model of sentiment analysis on Indosat Ooredoo service reviews on social media twitter using the Naive Bayes algorithm as a classifier of positive and negative labels. This sentiment analysis uses machine learning to get patterns an model that can be used again to predict new data.

Download Full-text

A Complete VADER-Based Sentiment Analysis of Bitcoin (BTC) Tweets during the Era of COVID-19

Big Data and Cognitive Computing ◽

10.3390/bdcc4040033 ◽

2020 ◽

Vol 4 (4) ◽

pp. 33

Author(s):

Toni Pano ◽

Rasha Kashef

Keyword(s):

Machine Learning ◽

Social Media ◽

Prediction Model ◽

Sentiment Analysis ◽

Significant Role ◽

Prediction Models ◽

Financial Sector ◽

Research Gap ◽

Text Preprocessing ◽

The Impact

During the COVID-19 pandemic, many research studies have been conducted to examine the impact of the outbreak on the financial sector, especially on cryptocurrencies. Social media, such as Twitter, plays a significant role as a meaningful indicator in forecasting the Bitcoin (BTC) prices. However, there is a research gap in determining the optimal preprocessing strategy in BTC tweets to develop an accurate machine learning prediction model for bitcoin prices. This paper develops different text preprocessing strategies for correlating the sentiment scores of Twitter text with Bitcoin prices during the COVID-19 pandemic. We explore the effect of different preprocessing functions, features, and time lengths of data on the correlation results. Out of 13 strategies, we discover that splitting sentences, removing Twitter-specific tags, or their combination generally improve the correlation of sentiment scores and volume polarity scores with Bitcoin prices. The prices only correlate well with sentiment scores over shorter timespans. Selecting the optimum preprocessing strategy would prompt machine learning prediction models to achieve better accuracy as compared to the actual prices.

Download Full-text

Evolution of corporate reputation during an evolving controversy

Journal of Communication Management ◽

10.1108/jcom-08-2018-0072 ◽

2019 ◽

Vol 23 (1) ◽

pp. 52-71 ◽

Cited By ~ 3

Author(s):

Siyoung Chung ◽

Mark Chong ◽

Jie Sheng Chua ◽

Jin Cheon Na

Keyword(s):

Machine Learning ◽

Social Media ◽

Sentiment Analysis ◽

Corporate Reputation ◽

Supervised Machine Learning ◽

Future Research ◽

Content Type ◽

Twitter Users ◽

Corporate Crisis ◽

The Impact

PurposeThe purpose of this paper is to investigate the evolution of online sentiments toward a company (i.e. Chipotle) during a crisis, and the effects of corporate apology on those sentiments.Design/methodology/approachUsing a very large data set of tweets (i.e. over 2.6m) about Company A’s food poisoning case (2015–2016). This case was selected because it is widely known, drew attention from various stakeholders and had many dynamics (e.g. multiple outbreaks, and across different locations). This study employed a supervised machine learning approach. Its sentiment polarity classification and relevance classification consisted of five steps: sampling, labeling, tokenization, augmentation of semantic representation, and the training of supervised classifiers for relevance and sentiment prediction.FindingsThe findings show that: the overall sentiment of tweets specific to the crisis was neutral; promotions and marketing communication may not be effective in converting negative sentiments to positive sentiments; a corporate crisis drew public attention and sparked public discussion on social media; while corporate apologies had a positive effect on sentiments, the effect did not last long, as the apologies did not remove public concerns about food safety; and some Twitter users exerted a significant influence on online sentiments through their popular tweets, which were heavily retweeted among Twitter users.Research limitations/implicationsEven with multiple training sessions and the use of a voting procedure (i.e. when there was a discrepancy in the coding of a tweet), there were some tweets that could not be accurately coded for sentiment. Aspect-based sentiment analysis and deep learning algorithms can be used to address this limitation in future research. This analysis of the impact of Chipotle’s apologies on sentiment did not test for a direct relationship. Future research could use manual coding to include only specific responses to the corporate apology. There was a delay between the time social media users received the news and the time they responded to it. Time delay poses a challenge to the sentiment analysis of Twitter data, as it is difficult to interpret which peak corresponds with which incident/s. This study focused solely on Twitter, which is just one of several social media sites that had content about the crisis.Practical implicationsFirst, companies should use social media as official corporate news channels and frequently update them with any developments about the crisis, and use them proactively. Second, companies in crisis should refrain from marketing efforts. Instead, they should focus on resolving the issue at hand and not attempt to regain a favorable relationship with stakeholders right away. Third, companies can leverage video, images and humor, as well as individuals with large online social networks to increase the reach and diffusion of their messages.Originality/valueThis study is among the first to empirically investigate the dynamics of corporate reputation as it evolves during a crisis as well as the effects of corporate apology on online sentiments. It is also one of the few studies that employs sentiment analysis using a supervised machine learning method in the area of corporate reputation and communication management. In addition, it offers valuable insights to both researchers and practitioners who wish to utilize big data to understand the online perceptions and behaviors of stakeholders during a corporate crisis.

Download Full-text

Negation Handling in Machine Learning-Based Sentiment Classification for Colloquial Arabic

International Journal of Operations Research and Information Systems ◽

10.4018/ijoris.2020100102 ◽

2020 ◽

Vol 11 (4) ◽

pp. 33-45

Author(s):

Omar Alharbi

Keyword(s):

Machine Learning ◽

Sentiment Analysis ◽

Positive Impact ◽

Machine Learning Algorithms ◽

Sentiment Classification ◽

Linguistic Knowledge ◽

Sentiment Lexicon ◽

Colloquial Arabic ◽

Arabic Sentiment Analysis ◽

The Impact

One crucial aspect of sentiment analysis is negation handling, where the occurrence of negation can flip the sentiment of a review and negatively affects the machine learning-based sentiment classification. The role of negation in Arabic sentiment analysis has been explored only to a limited extent, especially for colloquial Arabic. In this paper, the authors address the negation problem in colloquial Arabic sentiment classification using the machine learning approach. To this end, they propose a simple rule-based algorithm for handling the problem that affects the performance of a machine learning classifier. The rules were crafted based on observing many cases of negation, simple linguistic knowledge, and sentiment lexicon. They also examine the impact of the proposed algorithm on the performance of different machine learning algorithms. Furthermore, they compare the performance of the classifiers when their algorithm is used against three baselines. The experimental results show that there is a positive impact on the classifiers when the proposed algorithm is used compared to the baselines.

Download Full-text

Identification of HATE speech tweets in Pashto language using Machine Learning techniques

International Journal of Advanced Trends in Computer Science and Engineering ◽

10.30534/ijatcse/2021/021032021 ◽

2021 ◽

Vol 10 (3) ◽

pp. 1501-1508

Keyword(s):

Machine Learning ◽

Social Media ◽

Decision Tree ◽

Sentiment Analysis ◽

Hate Speech ◽

Research Work ◽

Machine Learning Techniques ◽

Related Data ◽

Learning Techniques ◽

The Impact

From the last few years, researchers are very much attracted to sentiment analysis, especially towards hate speech detectionsystems. As in different languages procreation of hate speech has compelling and symbolic consideration on social media. Hate speech has a great impact on society, using hate words harms others dignity. Hate speech detectionsystems areimportant to stop the transformation of hate words into crimes. In this research,a frameworkis developedfor hate speech detectionsystemin the Pashto language. A datasetis created for which data is collected from Twitter. Because there is no related data available. Most of the research work has been done in this domain for other languages, and it’s very maturein the context of detecting hate speech. But when it arrives at the morphological languages not much work has been done especially in the Pashto language. This researchaimed and collected data from Twitter, Tweets related to ethnicity and religion. The data collected from twitter has been annotated manually and categorized the data as hate or not by comparing it with the offensive content. For hate speechdetection systemsto view the impact of different features/attribute this study performed experiments on the existing classifiers i.e.,SVM, Naïve Bayes, Decision tree and KNN. SVM produced the highest result at dataset of 500 i.e.,74% among all the classifiers. KNN and Decision Tree produced same result at dataset of 1500 i.e.,65.0%. Dataset of 2800 Decision Tree produced the highest result i.e.,72% and SVM produced 71.9%.

Download Full-text

Tweets Classification on the Base of Sentiments for US Airline Companies

Entropy ◽

10.3390/e21111078 ◽

2019 ◽

Vol 21 (11) ◽

pp. 1078 ◽

Cited By ~ 7

Author(s):

Furqan Rustam ◽

Imran Ashraf ◽

Arif Mehmood ◽

Saleem Ullah ◽

Gyu Choi

Keyword(s):

Machine Learning ◽

Feature Extraction ◽

Sentiment Analysis ◽

Stochastic Gradient Descent ◽

Ensemble Classifiers ◽

Term Frequency ◽

Machine Learning Classifiers ◽

Learning Classifiers ◽

Lower Accuracy ◽

The Impact

The use of data from social networks such as Twitter has been increased during the last few years to improve political campaigns, quality of products and services, sentiment analysis, etc. Tweets classification based on user sentiments is a collaborative and important task for many organizations. This paper proposes a voting classifier (VC) to help sentiment analysis for such organizations. The VC is based on logistic regression (LR) and stochastic gradient descent classifier (SGDC) and uses a soft voting mechanism to make the final prediction. Tweets were classified into positive, negative and neutral classes based on the sentiments they contain. In addition, a variety of machine learning classifiers were evaluated using accuracy, precision, recall and F1 score as the performance metrics. The impact of feature extraction techniques, including term frequency (TF), term frequency-inverse document frequency (TF-IDF), and word2vec, on classification accuracy was investigated as well. Moreover, the performance of a deep long short-term memory (LSTM) network was analyzed on the selected dataset. The results show that the proposed VC performs better than that of other classifiers. The VC is able to achieve an accuracy of 0.789, and 0.791 with TF and TF-IDF feature extraction, respectively. The results demonstrate that ensemble classifiers achieve higher accuracy than non-ensemble classifiers. Experiments further proved that the performance of machine learning classifiers is better when TF-IDF is used as the feature extraction method. Word2vec feature extraction performs worse than TF and TF-IDF feature extraction. The LSTM achieves a lower accuracy than machine learning classifiers.

Download Full-text

A Framework for Sentiment Analysis of Telugu Tweets

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.f1602.089620 ◽

2020 ◽

Vol 9 (6) ◽

pp. 523-525

Keyword(s):

Neural Network ◽

Machine Learning ◽

Social Media ◽

Deep Learning ◽

Sentiment Analysis ◽

Recurrent Neural Network ◽

English Language ◽

Research Work ◽

Learning Approaches ◽

Text Data

Now a day Social Media like Facebook, twitter and Instagram is major Sources for people to share their emotions based on the current situations in society. By knowing the interesting patterns in it, a government/appropriate person for that situation can take good and useful decisions. Sentiment analysis is a method where people can extract the useful information from the text like the emotions (happy, sad, and neutral) of people. Much research work was been underdoing in the area of sentiment analysis. Among that work the Machine learning and Deep learning approaches plays a maximum role. Existing works on sentiment analysis is going in the English language. In this paper, proposed a novel framework that specifically designed to do sentiment analysis of the text data, that available in the telugu language. The proposed framework was integrated with the word embedding model Word2Vec, language translator and deep learning approaches like Recurrent Neural Network and Navie base algorithms to collect and analyse the sentiment in tweeter data that present in telugu language. The results shows effective in terms of accuracy, precision and specificity.

Download Full-text

Linguistic drivers of misinformation diffusion on social media during the COVID-19 pandemic

Italian Journal of Marketing ◽

10.1007/s43039-021-00026-9 ◽

2021 ◽

Author(s):

Giandomenico Di Domenico ◽

Annamaria Tuan ◽

Marco Visentin

Keyword(s):

Machine Learning ◽

Social Media ◽

New Technologies ◽

Fake News ◽

Conspiracy Theories ◽

Social Media Platform ◽

The Social ◽

Media Platform ◽

Textual Cues ◽

Twitter Users

AbstractIn the wake of the COVID-19 pandemic, unprecedent amounts of fake news and hoax spread on social media. In particular, conspiracy theories argued on the effect of specific new technologies like 5G and misinformation tarnished the reputation of brands like Huawei. Language plays a crucial role in understanding the motivational determinants of social media users in sharing misinformation, as people extract meaning from information based on their discursive resources and their skillset. In this paper, we analyze textual and non-textual cues from a panel of 4923 tweets containing the hashtags #5G and #Huawei during the first week of May 2020, when several countries were still adopting lockdown measures, to determine whether or not a tweet is retweeted and, if so, how much it is retweeted. Overall, through traditional logistic regression and machine learning, we found different effects of the textual and non-textual cues on the retweeting of a tweet and on its ability to accumulate retweets. In particular, the presence of misinformation plays an interesting role in spreading the tweet on the network. More importantly, the relative influence of the cues suggests that Twitter users actually read a tweet but not necessarily they understand or critically evaluate it before deciding to share it on the social media platform.

Download Full-text

Exploring Impact of Age and Gender on Sentiment Analysis Using Machine Learning

Electronics ◽

10.3390/electronics9020374 ◽

2020 ◽

Vol 9 (2) ◽

pp. 374 ◽

Cited By ~ 2

Author(s):

Sudhanshu Kumar ◽

Monika Gahalawat ◽

Partha Pratim Roy ◽

Debi Prosad Dogra ◽

Byung-Gyu Kim

Keyword(s):

Machine Learning ◽

Sentiment Analysis ◽

Short Term Memory ◽

Age Groups ◽

Modern World ◽

Support Vector ◽

Digital Information ◽

Age And Gender ◽

And Gender ◽

The Impact

Sentiment analysis is a rapidly growing field of research due to the explosive growth in digital information. In the modern world of artificial intelligence, sentiment analysis is one of the essential tools to extract emotion information from massive data. Sentiment analysis is applied to a variety of user data from customer reviews to social network posts. To the best of our knowledge, there is less work on sentiment analysis based on the categorization of users by demographics. Demographics play an important role in deciding the marketing strategies for different products. In this study, we explore the impact of age and gender in sentiment analysis, as this can help e-commerce retailers to market their products based on specific demographics. The dataset is created by collecting reviews on books from Facebook users by asking them to answer a questionnaire containing questions about their preferences in books, along with their age groups and gender information. Next, the paper analyzes the segmented data for sentiments based on each age group and gender. Finally, sentiment analysis is done using different Machine Learning (ML) approaches including maximum entropy, support vector machine, convolutional neural network, and long short term memory to study the impact of age and gender on user reviews. Experiments have been conducted to identify new insights into the effect of age and gender for sentiment analysis.

Download Full-text