scholarly journals Bag-of-Phrases (BoPh) and sentiment analysis of Arabic text in Twitter

2020 ◽  
Vol 13 (40) ◽  
pp. 4202-4215
Author(s):  
Hamoud H Alshammari

Background/Objectives: Sentiment analysis plays main role in various text mining problems. Although, the Arabic text mining is important especially in the field of sentiment analysis, there is a paucity of research in it, especially, when it plays an important role in different issues in Arabic countries. Arabic language has many dialects that people use to express their feelings in social media. The objective of this study is to perform an experiment that follow the subjective opinion from the text. Subjective Analysis is one way that we can implement to improve the accuracy of the sentiment results in such texts in some dialects, that hide various meanings behind the words such as Saudi dialect. Methods/Statistical analysis: In this study, we manually annotated more than 8,000 tweets to have training and testing data sets with positive or negative words and phrases. Then we proposed a “Bag of Phrases” methodology to analyze the sentiments in the texts, which helped to improve the performance of sentiment analysis. Since using bag of words method is not enough in many cases, we applied a Naive Bayes algorithm to test our method. Findings: The results show that the accuracy of having True positive or True negative is about 84% comparing by using manual annotation process. The accuracy is calculated after taking into consideration the margin of error due to the manual annotation step and subjective interpretation of the texts by the annotators. Novelty/Applications: The novelty of the study is having more accurate training data set comparing with the other works in Saudi dialect for Arabic text, and proposing the BoPh concept.

2018 ◽  
Vol 34 (3) ◽  
pp. 569-581 ◽  
Author(s):  
Sujata Rani ◽  
Parteek Kumar

Abstract In this article, an innovative approach to perform the sentiment analysis (SA) has been presented. The proposed system handles the issues of Romanized or abbreviated text and spelling variations in the text to perform the sentiment analysis. The training data set of 3,000 movie reviews and tweets has been manually labeled by native speakers of Hindi in three classes, i.e. positive, negative, and neutral. The system uses WEKA (Waikato Environment for Knowledge Analysis) tool to convert these string data into numerical matrices and applies three machine learning techniques, i.e. Naive Bayes (NB), J48, and support vector machine (SVM). The proposed system has been tested on 100 movie reviews and tweets, and it has been observed that SVM has performed best in comparison to other classifiers, and it has an accuracy of 68% for movie reviews and 82% in case of tweets. The results of the proposed system are very promising and can be used in emerging applications like SA of product reviews and social media analysis. Additionally, the proposed system can be used in other cultural/social benefits like predicting/fighting human riots.


Author(s):  
Nourah F. Bin Hathlian ◽  
Alaaeldin M. Hafez

The need for designing Arabic text mining systems for the use on social media posts is increasingly becoming a significant and attractive research area. It serves and enhances the knowledge needed in various domains. The main focus of this paper is to propose a novel framework combining sentiment analysis with subjective analysis on Arabic social media posts to determine whether people are interested or not interested in a defined subject. For those purposes, text classification methods—including preprocessing and machine learning mechanisms—are applied. Essentially, the performance of the framework is tested using Twitter as a data source, where possible volunteers on a certain subject are identified based on their posted tweets along with their subject-related information. Twitter is considered because of its popularity and its rich content from online microblogging services. The results obtained are very promising with an accuracy of 89%, thereby encouraging further research.


Author(s):  
Anastasia V. Kolmogorova

The article aims to analyze the validity of Internet confession texts used as a source of training data set for designing computer classifier of Internet texts in Russian according to their emotional tonality. Thus, the classifier, backed by Lövheim’s emotional cube model, is expected to detect eight classes of emotions represented in the text or to assign the text to the emotionally neutral class. The first and one of the most important stages of the classifier creation is the training data set selection. The training data set in Machine Learning is the actual dataset used to train the model for performing various actions. The internet text genres that are traditionally used in sentiment analysis to train two or three tonalities classifiers are twits, films and market reviews, blogs and financial reports. The novelty of our project consists in designing multiclass classifier that requires a new non-trivial training data. As such, we have chosen the texts from public group Overheard in Russian social network VKontakte. As all texts show similarities, we united them under the genre name “Internet confession”. To feature the genre, we applied the method of narrative semiotics describing six positions forming the deep narrative structure of “Internet confession”: Addresser – a person aware of her/his separateness from the society; Addressee – society / public opinion; Subject – a narrator describing his / her emotional state; Object – the person’s self-image; Helper – the person’s frankness; Adversary – the person’s shame. The above mentioned genre features determine its primary advantage – a qualitative one – to be especially focused on the emotionality while more traditional sources of textual data are based on such categories as expressivity (twits) or axiological estimations (all sorts of reviews). The structural analysis of texts under discussion has also demonstrated several advantages due to the technological basis of the Overheard project: the text hashtagging prevents the researcher from submitting the whole collection to the crowdsourcing assessment; its size is optimal for assessment by experts; despite their hyperbolized emotionality, the texts of Internet confession genre share the stylistic features typical of different types of personal internet discourse. However, the narrative character of all Internet confession texts implies some restrictions in their use within sentiment analysis project.


2020 ◽  
Vol 8 (4) ◽  
pp. 47-62
Author(s):  
Francisca Oladipo ◽  
Ogunsanya, F. B ◽  
Musa, A. E. ◽  
Ogbuju, E. E ◽  
Ariwa, E.

The social media space has evolved into a large labyrinth of information exchange platform and due to the growth in the adoption of different social media platforms, there has been an increasing wave of interests in sentiment analysis as a paradigm for the mining and analysis of users’ opinions and sentiments based on their posts. In this paper, we present a review of contextual sentiment analysis on social media entries with a specific focus on Twitter. The sentimental analysis consists of two broad approaches which are machine learning which uses classification techniques to classify text and is further categorized into supervised learning and unsupervised learning; and the lexicon-based approach which uses a dictionary without using any test or training data set, unlike the machine learning approach.  


Author(s):  
Salima Behdenna ◽  
Fatiha Barigou ◽  
Ghalem Belalem

Sentiment analysis is a text mining discipline that aims to identify and extract subjective information. This growing field results in the emergence of three levels of granularity (document, sentence, and aspect). However, both the document and sentence levels do not find what exactly the opinion holder likes and dislikes. Furthermore, most research in this field deals with English texts, and very limited researches are undertaken on Arabic language. In this paper, the authors propose a semantic aspect-based sentiment analysis approach for Arabic reviews. This approach utilizes the semantic of description logics and linguistic rules in the identification of opinion targets and their polarity.


Author(s):  
Alaa Abdalqahar Jihad ◽  
Ahmed Subhi Abdalkafor

<p>Over the last decade there has been an increase in number of E-mails or comments to a company via social media sites, to satisfy their customers, the company must take in to consideration these messages and comments and know whether the customers are satisfied with what the company offers or not. Several techniques have been proposed to analyze the sentiment of the comment writer. Dealing with the Arabic language is faced with many challenges, such as it is a morphologically rich language and how to return the word to its original root. In this paper the challenges of dealing with the Arabic language were reviewed and a framework was also established to analyze the comments in Arabic and classify it into positive, negative or neutral sentiment. The framework was trained and tested and then the con-clusions were drawn based on its work.</p>


Credit card fraud introduces to the physical loss of a credit card or the destruction of sensitive credit card data. Several text mining procedures can be used for disclosure. This investigation reveals several algorithms that can be used to analyze transactions as a fraud or as a real background. This paper represents the possibility of fraudulent transactions in the prevalence and meaning of credit card usage also, Credit card fraud data collection was used in the investigation. Since the dataset was largely unbalanced, SMOTE (Synthetic Minority oversampling Technique) is applying for an overdose. In addition, jobs selected, and the data set divided into two parts, training data and test data. In this paper, The Advanced Super Gradient Boostingbased Text mining Algorithm (ASGB) suggested to detect the fraud transaction in Credit card transactions. ASGB is a Decision-Tree-Based Ensemble Text mining algorithm that utilizes a gradient boosting framework. In forecast difficulties, including unstructured data (Images, Text, etc.), artificial neural networks tend to exceed all other algorithms or structures. The proposed algorithms used in the experiment were the Hidden Markov Model, Random Forest, Gradient Boosting, and Enhanced Hidden Markov Model. The Experimental Results show that proposed algorithms, a welltuned ASGB classifier outperforms all of them. And it presents better Precision is 99.1%, and Recall is 99.8%, F-measure is 99.5%.


2020 ◽  
Vol 9 (3) ◽  
pp. 237-246
Author(s):  
Nabila Surya Wardani ◽  
Alan Prahutama ◽  
Puspita Kartikasari

Text mining is a variation on a field called data mining that tries to find interesting patterns from large databases. Indonesian President affirmed that the capital would be moved to East Kalimantan on August 26, 2019. That planning would receive pros and cons from public. Sentiment analysis is part of text mining that typically involves taking data from opinion, comment, or response. Sentiment analysis is the choice to do on this topic to get results about the public’s opinion. As the most used social media in Indonesia, Youtube is able to be data source by crawling the comments on a video uploaded by Kompas TV channel. Those comments were crawled on October 15, 2019, and selected 1500 latest comments (August 26 – October 12, 2019). The selected comments get transformed by using data pre-processing technique that involves case folding, removing mention, unescaping HTML, removing numbers, removing punctuation, text normalization, stripping whitespace, stopwords removal, tokenizing, and stemming. Labeling of sentiment class uses the sentiment scoring technique. The number of negative comments is 849, while the number of positive comments is 651. The ratio between training data and testing data is 80%: 20%. The classification method used to do sentiment analysis is the Naive Bayes Classifier for Bernoulli and Multinomial model. Bernoulli model only uses occurrence information, whereas the multinomial model keeps track of multiple occurrences. The results show that Bernoulli Naïve Bayes has a 93,45% level of sensitivity (recall) and Multinomial Naïve Bayes has a 90,19% level of sensitivity (recall). It means that both Bernoulli and Multinomial have a good result for this research. Keywords: Text Mining, Relocation of Indonesia’s Capital, Youtube, Bernoulli Naïve Bayes, Multinomial Naïve Bayes, Sensitivity (Recall). 


2021 ◽  
Vol 11 (11) ◽  
pp. 4768
Author(s):  
Sanaa Kaddoura ◽  
Maher Itani ◽  
Chris Roast

With the increase in the number of users on social networks, sentiment analysis has been gaining attention. Sentiment analysis establishes the aggregation of these opinions to inform researchers about attitudes towards products or topics. Social network data commonly contain authors’ opinions about specific subjects, such as people’s opinions towards steps taken to manage the COVID-19 pandemic. Usually, people use dialectal language in their posts on social networks. Dialectal language has obstacles that make opinion analysis a challenging process compared to working with standard language. For the Arabic language, Modern Standard Arabic tools (MSA) cannot be employed with social network data that contain dialectal language. Another challenge of the dialectal Arabic language is the polarity of opinionated words affected by inverters, such as negation, that tend to change the word’s polarity from positive to negative and vice versa. This work analyzes the effect of inverters on sentiment analysis of social network dialectal Arabic posts. It discusses the different reasons that hinder the trivial resolution of inverters. An experiment is conducted on a corpus of data collected from Facebook. However, the same work can be applied to other social network posts. The results show the impact that resolution of negation may have on the classification accuracy. The results show that the F1 score increases by 20% if negation is treated in the text.


Sign in / Sign up

Export Citation Format

Share Document