Bag-of-Phrases (BoPh) and sentiment analysis of Arabic text in Twitter

Background/Objectives: Sentiment analysis plays main role in various text mining problems. Although, the Arabic text mining is important especially in the field of sentiment analysis, there is a paucity of research in it, especially, when it plays an important role in different issues in Arabic countries. Arabic language has many dialects that people use to express their feelings in social media. The objective of this study is to perform an experiment that follow the subjective opinion from the text. Subjective Analysis is one way that we can implement to improve the accuracy of the sentiment results in such texts in some dialects, that hide various meanings behind the words such as Saudi dialect. Methods/Statistical analysis: In this study, we manually annotated more than 8,000 tweets to have training and testing data sets with positive or negative words and phrases. Then we proposed a “Bag of Phrases” methodology to analyze the sentiments in the texts, which helped to improve the performance of sentiment analysis. Since using bag of words method is not enough in many cases, we applied a Naive Bayes algorithm to test our method. Findings: The results show that the accuracy of having True positive or True negative is about 84% comparing by using manual annotation process. The accuracy is calculated after taking into consideration the margin of error due to the manual annotation step and subjective interpretation of the texts by the annotators. Novelty/Applications: The novelty of the study is having more accurate training data set comparing with the other works in Saudi dialect for Arabic text, and proposing the BoPh concept.

Download Full-text

A sentiment analysis system for social media using machine learning techniques: Social enablement

Digital Scholarship in the Humanities ◽

10.1093/llc/fqy037 ◽

2018 ◽

Vol 34 (3) ◽

pp. 569-581 ◽

Cited By ~ 1

Author(s):

Sujata Rani ◽

Parteek Kumar

Keyword(s):

Machine Learning ◽

Social Media ◽

Sentiment Analysis ◽

Media Analysis ◽

Training Data ◽

Machine Learning Techniques ◽

Support Vector ◽

Analysis Tool ◽

Data Set ◽

Learning Techniques

Abstract In this article, an innovative approach to perform the sentiment analysis (SA) has been presented. The proposed system handles the issues of Romanized or abbreviated text and spelling variations in the text to perform the sentiment analysis. The training data set of 3,000 movie reviews and tweets has been manually labeled by native speakers of Hindi in three classes, i.e. positive, negative, and neutral. The system uses WEKA (Waikato Environment for Knowledge Analysis) tool to convert these string data into numerical matrices and applies three machine learning techniques, i.e. Naive Bayes (NB), J48, and support vector machine (SVM). The proposed system has been tested on 100 movie reviews and tweets, and it has been observed that SVM has performed best in comparison to other classifiers, and it has an accuracy of 68% for movie reviews and 82% in case of tweets. The results of the proposed system are very promising and can be used in emerging applications like SA of product reviews and social media analysis. Additionally, the proposed system can be used in other cultural/social benefits like predicting/fighting human riots.

Download Full-text

Subjective Text Mining for Arabic Social Media

International Journal on Semantic Web and Information Systems ◽

10.4018/ijswis.2017040101 ◽

2017 ◽

Vol 13 (2) ◽

pp. 1-13 ◽

Cited By ~ 5

Author(s):

Nourah F. Bin Hathlian ◽

Alaaeldin M. Hafez

Keyword(s):

Social Media ◽

Text Mining ◽

Research Area ◽

Arabic Text ◽

Classification Methods ◽

Learning Mechanisms ◽

Subjective Analysis ◽

Related Information ◽

Data Source ◽

Rich Content

The need for designing Arabic text mining systems for the use on social media posts is increasingly becoming a significant and attractive research area. It serves and enhances the knowledge needed in various domains. The main focus of this paper is to propose a novel framework combining sentiment analysis with subjective analysis on Arabic social media posts to determine whether people are interested or not interested in a defined subject. For those purposes, text classification methods—including preprocessing and machine learning mechanisms—are applied. Essentially, the performance of the framework is tested using Twitter as a data source, where possible volunteers on a certain subject are identified based on their posted tweets along with their subject-related information. Twitter is considered because of its popularity and its rich content from online microblogging services. The results obtained are very promising with an accuracy of 89%, thereby encouraging further research.

Download Full-text

Texts of “internet confessions” as a source for training data set for the research on the sentiment-analysis field

Vestnik NSU Series Linguistics and Intercultural Communication ◽

10.25205/1818-7935-2019-17-3-71-82 ◽

2019 ◽

Vol 17 (3) ◽

pp. 71-82

Author(s):

Anastasia V. Kolmogorova

Keyword(s):

Sentiment Analysis ◽

Narrative Structure ◽

Training Data ◽

Data Set ◽

Financial Reports ◽

Technological Basis ◽

Self Image ◽

Textual Data ◽

Primary Advantage ◽

Multiclass Classifier

The article aims to analyze the validity of Internet confession texts used as a source of training data set for designing computer classifier of Internet texts in Russian according to their emotional tonality. Thus, the classifier, backed by Lövheim’s emotional cube model, is expected to detect eight classes of emotions represented in the text or to assign the text to the emotionally neutral class. The first and one of the most important stages of the classifier creation is the training data set selection. The training data set in Machine Learning is the actual dataset used to train the model for performing various actions. The internet text genres that are traditionally used in sentiment analysis to train two or three tonalities classifiers are twits, films and market reviews, blogs and financial reports. The novelty of our project consists in designing multiclass classifier that requires a new non-trivial training data. As such, we have chosen the texts from public group Overheard in Russian social network VKontakte. As all texts show similarities, we united them under the genre name “Internet confession”. To feature the genre, we applied the method of narrative semiotics describing six positions forming the deep narrative structure of “Internet confession”: Addresser – a person aware of her/his separateness from the society; Addressee – society / public opinion; Subject – a narrator describing his / her emotional state; Object – the person’s self-image; Helper – the person’s frankness; Adversary – the person’s shame. The above mentioned genre features determine its primary advantage – a qualitative one – to be especially focused on the emotionality while more traditional sources of textual data are based on such categories as expressivity (twits) or axiological estimations (all sorts of reviews). The structural analysis of texts under discussion has also demonstrated several advantages due to the technological basis of the Overheard project: the text hashtagging prevents the researcher from submitting the whole collection to the crowdsourcing assessment; its size is optimal for assessment by experts; despite their hyperbolized emotionality, the texts of Internet confession genre share the stylistic features typical of different types of personal internet discourse. However, the narrative character of all Internet confession texts implies some restrictions in their use within sentiment analysis project.

Download Full-text

Reviewing Sentiment Analysis at the Shallow End

Transactions on Machine Learning and Artificial Intelligence ◽

10.14738/tmlai.84.8274 ◽

2020 ◽

Vol 8 (4) ◽

pp. 47-62

Author(s):

Francisca Oladipo ◽

Ogunsanya, F. B ◽

Musa, A. E. ◽

Ogbuju, E. E ◽

Ariwa, E.

Keyword(s):

Machine Learning ◽

Social Media ◽

Sentiment Analysis ◽

Information Exchange ◽

Training Data ◽

Data Set ◽

The Social ◽

Machine Learning Approach ◽

Media Space ◽

Social Media Platforms

The social media space has evolved into a large labyrinth of information exchange platform and due to the growth in the adoption of different social media platforms, there has been an increasing wave of interests in sentiment analysis as a paradigm for the mining and analysis of users’ opinions and sentiments based on their posts. In this paper, we present a review of contextual sentiment analysis on social media entries with a specific focus on Twitter. The sentimental analysis consists of two broad approaches which are machine learning which uses classification techniques to classify text and is further categorized into supervised learning and unsupervised learning; and the lexicon-based approach which uses a dictionary without using any test or training data set, unlike the machine learning approach.

Download Full-text

Towards Semantic Aspect-Based Sentiment Analysis for Arabic Reviews

International Journal of Information Systems in the Service Sector ◽

10.4018/ijisss.2020100101 ◽

2020 ◽

Vol 12 (4) ◽

pp. 1-13

Author(s):

Salima Behdenna ◽

Fatiha Barigou ◽

Ghalem Belalem

Keyword(s):

Text Mining ◽

Sentiment Analysis ◽

Description Logics ◽

Arabic Language ◽

Analysis Approach ◽

Subjective Information ◽

Linguistic Rules ◽

Semantic Aspect

Sentiment analysis is a text mining discipline that aims to identify and extract subjective information. This growing field results in the emergence of three levels of granularity (document, sentence, and aspect). However, both the document and sentence levels do not find what exactly the opinion holder likes and dislikes. Furthermore, most research in this field deals with English texts, and very limited researches are undertaken on Arabic language. In this paper, the authors propose a semantic aspect-based sentiment analysis approach for Arabic reviews. This approach utilizes the semantic of description logics and linguistic rules in the identification of opinion targets and their polarity.

Download Full-text

A framework for sentiment analysis in Arabic text

Indonesian Journal of Electrical Engineering and Computer Science ◽

10.11591/ijeecs.v16.i3.pp1482-1489 ◽

2019 ◽

Vol 16 (3) ◽

pp. 1482

Author(s):

Alaa Abdalqahar Jihad ◽

Ahmed Subhi Abdalkafor

Keyword(s):

Social Media ◽

Sentiment Analysis ◽

Arabic Language ◽

Arabic Text ◽

A Company

<p>Over the last decade there has been an increase in number of E-mails or comments to a company via social media sites, to satisfy their customers, the company must take in to consideration these messages and comments and know whether the customers are satisfied with what the company offers or not. Several techniques have been proposed to analyze the sentiment of the comment writer. Dealing with the Arabic language is faced with many challenges, such as it is a morphologically rich language and how to return the word to its original root. In this paper the challenges of dealing with the Arabic language were reviewed and a framework was also established to analyze the comments in Arabic and classify it into positive, negative or neutral sentiment. The framework was trained and tested and then the con-clusions were drawn based on its work.</p>

Download Full-text

Credit Card Fraud Detection Performance Improvement using Advanced Super Gradient Boosting Algorithm

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.f3457.049620 ◽

2020 ◽

Vol 9 (6) ◽

pp. 179-184

Keyword(s):

Text Mining ◽

Markov Model ◽

Hidden Markov Model ◽

Credit Card ◽

Hidden Markov ◽

Training Data ◽

Gradient Boosting ◽

Data Set ◽

Credit Card Fraud ◽

Mining Algorithm

Credit card fraud introduces to the physical loss of a credit card or the destruction of sensitive credit card data. Several text mining procedures can be used for disclosure. This investigation reveals several algorithms that can be used to analyze transactions as a fraud or as a real background. This paper represents the possibility of fraudulent transactions in the prevalence and meaning of credit card usage also, Credit card fraud data collection was used in the investigation. Since the dataset was largely unbalanced, SMOTE (Synthetic Minority oversampling Technique) is applying for an overdose. In addition, jobs selected, and the data set divided into two parts, training data and test data. In this paper, The Advanced Super Gradient Boostingbased Text mining Algorithm (ASGB) suggested to detect the fraud transaction in Credit card transactions. ASGB is a Decision-Tree-Based Ensemble Text mining algorithm that utilizes a gradient boosting framework. In forecast difficulties, including unstructured data (Images, Text, etc.), artificial neural networks tend to exceed all other algorithms or structures. The proposed algorithms used in the experiment were the Hidden Markov Model, Random Forest, Gradient Boosting, and Enhanced Hidden Markov Model. The Experimental Results show that proposed algorithms, a welltuned ASGB classifier outperforms all of them. And it presents better Precision is 99.1%, and Recall is 99.8%, F-measure is 99.5%.

Download Full-text

ANALISIS SENTIMEN PEMINDAHAN IBU KOTA NEGARA DENGAN KLASIFIKASI NAÏVE BAYES UNTUK MODEL BERNOULLI DAN MULTINOMIAL

Jurnal Gaussian ◽

10.14710/j.gauss.v9i3.27963 ◽

2020 ◽

Vol 9 (3) ◽

pp. 237-246

Author(s):

Nabila Surya Wardani ◽

Alan Prahutama ◽

Puspita Kartikasari

Keyword(s):

Text Mining ◽

Sentiment Analysis ◽

Naive Bayes ◽

Processing Technique ◽

Naïve Bayes ◽

Training Data ◽

Multinomial Model ◽

East Kalimantan ◽

Negative Comments ◽

Using Data

Text mining is a variation on a field called data mining that tries to find interesting patterns from large databases. Indonesian President affirmed that the capital would be moved to East Kalimantan on August 26, 2019. That planning would receive pros and cons from public. Sentiment analysis is part of text mining that typically involves taking data from opinion, comment, or response. Sentiment analysis is the choice to do on this topic to get results about the public’s opinion. As the most used social media in Indonesia, Youtube is able to be data source by crawling the comments on a video uploaded by Kompas TV channel. Those comments were crawled on October 15, 2019, and selected 1500 latest comments (August 26 – October 12, 2019). The selected comments get transformed by using data pre-processing technique that involves case folding, removing mention, unescaping HTML, removing numbers, removing punctuation, text normalization, stripping whitespace, stopwords removal, tokenizing, and stemming. Labeling of sentiment class uses the sentiment scoring technique. The number of negative comments is 849, while the number of positive comments is 651. The ratio between training data and testing data is 80%: 20%. The classification method used to do sentiment analysis is the Naive Bayes Classifier for Bernoulli and Multinomial model. Bernoulli model only uses occurrence information, whereas the multinomial model keeps track of multiple occurrences. The results show that Bernoulli Naïve Bayes has a 93,45% level of sensitivity (recall) and Multinomial Naïve Bayes has a 90,19% level of sensitivity (recall). It means that both Bernoulli and Multinomial have a good result for this research. Keywords: Text Mining, Relocation of Indonesia’s Capital, Youtube, Bernoulli Naïve Bayes, Multinomial Naïve Bayes, Sensitivity (Recall).

Download Full-text

Analyzing the Effect of Negation in Sentiment Polarity of Facebook Dialectal Arabic Text

Applied Sciences ◽

10.3390/app11114768 ◽

2021 ◽

Vol 11 (11) ◽

pp. 4768

Author(s):

Sanaa Kaddoura ◽

Maher Itani ◽

Chris Roast

Keyword(s):

Social Networks ◽

Social Network ◽

Sentiment Analysis ◽

Arabic Language ◽

Network Data ◽

Arabic Text ◽

Social Network Data ◽

Dialectal Arabic ◽

The Impact ◽

Modern Standard

With the increase in the number of users on social networks, sentiment analysis has been gaining attention. Sentiment analysis establishes the aggregation of these opinions to inform researchers about attitudes towards products or topics. Social network data commonly contain authors’ opinions about specific subjects, such as people’s opinions towards steps taken to manage the COVID-19 pandemic. Usually, people use dialectal language in their posts on social networks. Dialectal language has obstacles that make opinion analysis a challenging process compared to working with standard language. For the Arabic language, Modern Standard Arabic tools (MSA) cannot be employed with social network data that contain dialectal language. Another challenge of the dialectal Arabic language is the polarity of opinionated words affected by inverters, such as negation, that tend to change the word’s polarity from positive to negative and vice versa. This work analyzes the effect of inverters on sentiment analysis of social network dialectal Arabic posts. It discusses the different reasons that hinder the trivial resolution of inverters. An experiment is conducted on a corpus of data collected from Facebook. However, the same work can be applied to other social network posts. The results show the impact that resolution of negation may have on the classification accuracy. The results show that the F1 score increases by 20% if negation is treated in the text.

Download Full-text

Anti-Islamic Arabic Text Categorization using Text Mining and Sentiment Analysis Techniques

International Journal of Advanced Computer Science and Applications ◽

10.14569/ijacsa.2021.0120889 ◽

2021 ◽

Vol 12 (8) ◽

Author(s):

Rawan Abdullah Alraddadi ◽

Moulay Ibrahim El-Khalil Ghembaza

Keyword(s):

Text Mining ◽

Sentiment Analysis ◽

Text Categorization ◽

Arabic Text ◽

Analysis Techniques

Download Full-text