scholarly journals SSAAR: An enhanced System for Sentiment Analysis of Arabic Reviews

2020 ◽  
Vol 20 ◽  
pp. 81-95
Author(s):  
Manal Nejjari ◽  
Abdelouafi Meziane

Sentiment Analysis, or Opinion Mining, has recently captivated the interest of scientists worldwide. With the increasing use of the internet, the web is becoming overloaded by data that contains useful information, which can be used in different fields. In fact, many studies have shed light on Sentiment Analysis of online data in different languages. However, the amount of research dealing with the Arabic language is still limited. In this paper, an empirical study is led to Sentiment Analysis of online reviews written in Modern Standard Arabic. A new system called SSAAR (System for Sentiment Analysis of Arabic Reviews) is proposed, allowing computational classification of reviews into three classes (positive, negative, neutral). The input data of this system is built by using a proposed framework called SPPARF (Scraping and double Preprocessing Arabic Reviews Framework), which generates a structured and clean dataset. Moreover, the provided system experiments two improved approaches for sentiment classification based on supervised learning, which are: Double preprocessing method and Feature selection method. Both approaches are trained by using five algorithms (Naïve Bayes, stochastic gradient descent Classifier (SGD), Logistic Regression, K-Nearest Neighbors, and Random Forest) and compared later under the same conditions. The experimental results show that the feature selection method using the SGD Classifier performs the best accuracy (77.1%). Therefore, the SSAAR System proved to be efficient and gives better results when using the feature selection method; nevertheless, satisfying results were obtained with the other approach, considered consequently suitable for the proposed system.

2018 ◽  
Vol 29 (1) ◽  
pp. 1122-1134
Author(s):  
H. M. Keerthi Kumar ◽  
B. S. Harish

Abstract In recent internet era, micro-blogging sites produce enormous amount of short textual information, which appears in the form of opinions or sentiments of users. Sentiment analysis is a challenging task in short text, due to use of formal language, misspellings, and shortened forms of words, which leads to high dimensionality and sparsity. In order to deal with these challenges, this paper proposes a novel, simple, and yet effective feature selection method, to select frequently distributed features related to each class. In this paper, the feature selection method is based on class-wise information, to identify the relevant feature related to each class. We evaluate the proposed feature selection method by comparing with existing feature selection methods like chi-square ( χ2), entropy, information gain, and mutual information. The performances are evaluated using classification accuracy obtained from support vector machine, K nearest neighbors, and random forest classifiers on two publically available datasets viz., Stanford Twitter dataset and Ravikiran Janardhana dataset. In order to demonstrate the effectiveness of the proposed feature selection method, we conducted extensive experimentation by selecting different feature sets. The proposed feature selection method outperforms the existing feature selection methods in terms of classification accuracy on the Stanford Twitter dataset. Similarly, the proposed method performs competently equally in terms of classification accuracy compared to other feature selection methods in most of the feature subsets on Ravikiran Janardhana dataset.


Author(s):  
*Fadare Oluwaseun Gbenga ◽  
Adetunmbi Adebayo Olusola ◽  
(Mrs) Oyinloye Oghenerukevwe Eloho ◽  
Mogaji Stephen Alaba

The multiplication of malware variations is probably the greatest problem in PC security and the protection of information in form of source code against unauthorized access is a central issue in computer security. In recent times, machine learning has been extensively researched for malware detection and ensemble technique has been established to be highly effective in terms of detection accuracy. This paper proposes a framework that combines combining the exploit of both Chi-square as the feature selection method and eight ensemble learning classifiers on five base learners- K-Nearest Neighbors, Naïve Bayes, Support Vector Machine, Decision Trees, and Logistic Regression. K-Nearest Neighbors returns the highest accuracy of 95.37%, 87.89% on chi-square, and without feature selection respectively. Extreme Gradient Boosting Classifier ensemble accuracy is the highest with 97.407%, 91.72% with Chi-square as feature selection, and ensemble methods without feature selection respectively. Extreme Gradient Boosting Classifier and Random Forest are leading in the seven evaluative measures of chi-square as a feature selection method and ensemble methods without feature selection respectively. The study results show that the tree-based ensemble model is compelling for malware classification.


2014 ◽  
Vol 631-632 ◽  
pp. 1219-1223
Author(s):  
Jia Hao Chen ◽  
Jian Hua Wu

With the rapid development of Internet and occurrence of social media services, many users are becoming the creators of social information. However, the normal manual work can't deal with a large number of subjective messages. As a new kind of social media service, micro blog has been widely accepted and can be used for sentiment analysis. This paper compared performances of three machine learning methods on sentiment analysis of Chinese micro blog. We also proposed an improved feature selection method that increases the accuracy of classification. Experiment results show that SVM is closed to Naïve Bayes and they are better than logistic regression in most cases.


Sign in / Sign up

Export Citation Format

Share Document