SSAAR: An enhanced System for Sentiment Analysis of Arabic Reviews

Sentiment Analysis, or Opinion Mining, has recently captivated the interest of scientists worldwide. With the increasing use of the internet, the web is becoming overloaded by data that contains useful information, which can be used in different fields. In fact, many studies have shed light on Sentiment Analysis of online data in different languages. However, the amount of research dealing with the Arabic language is still limited. In this paper, an empirical study is led to Sentiment Analysis of online reviews written in Modern Standard Arabic. A new system called SSAAR (System for Sentiment Analysis of Arabic Reviews) is proposed, allowing computational classification of reviews into three classes (positive, negative, neutral). The input data of this system is built by using a proposed framework called SPPARF (Scraping and double Preprocessing Arabic Reviews Framework), which generates a structured and clean dataset. Moreover, the provided system experiments two improved approaches for sentiment classification based on supervised learning, which are: Double preprocessing method and Feature selection method. Both approaches are trained by using five algorithms (Naïve Bayes, stochastic gradient descent Classifier (SGD), Logistic Regression, K-Nearest Neighbors, and Random Forest) and compared later under the same conditions. The experimental results show that the feature selection method using the SGD Classifier performs the best accuracy (77.1%). Therefore, the SSAAR System proved to be efficient and gives better results when using the feature selection method; nevertheless, satisfying results were obtained with the other approach, considered consequently suitable for the proposed system.

Download Full-text

A New Feature Selection Method for Sentiment Analysis in Short Text

Journal of Intelligent Systems ◽

10.1515/jisys-2018-0171 ◽

2018 ◽

Vol 29 (1) ◽

pp. 1122-1134

Author(s):

H. M. Keerthi Kumar ◽

B. S. Harish

Keyword(s):

Feature Selection ◽

Sentiment Analysis ◽

Classification Accuracy ◽

Information Gain ◽

Feature Selection Method ◽

Selection Method ◽

Support Vector ◽

Selection Methods ◽

K Nearest Neighbors ◽

Short Text

Abstract In recent internet era, micro-blogging sites produce enormous amount of short textual information, which appears in the form of opinions or sentiments of users. Sentiment analysis is a challenging task in short text, due to use of formal language, misspellings, and shortened forms of words, which leads to high dimensionality and sparsity. In order to deal with these challenges, this paper proposes a novel, simple, and yet effective feature selection method, to select frequently distributed features related to each class. In this paper, the feature selection method is based on class-wise information, to identify the relevant feature related to each class. We evaluate the proposed feature selection method by comparing with existing feature selection methods like chi-square ( χ2), entropy, information gain, and mutual information. The performances are evaluated using classification accuracy obtained from support vector machine, K nearest neighbors, and random forest classifiers on two publically available datasets viz., Stanford Twitter dataset and Ravikiran Janardhana dataset. In order to demonstrate the effectiveness of the proposed feature selection method, we conducted extensive experimentation by selecting different feature sets. The proposed feature selection method outperforms the existing feature selection methods in terms of classification accuracy on the Stanford Twitter dataset. Similarly, the proposed method performs competently equally in terms of classification accuracy compared to other feature selection methods in most of the feature subsets on Ravikiran Janardhana dataset.

Download Full-text

A Novel Feature Selection Method Based on Genetic Algorithm for Opinion Mining of Social Media Reviews

Communications in Computer and Information Science - Information, Communication and Computing Technology ◽

10.1007/978-981-13-5992-7_15 ◽

2019 ◽

pp. 167-175 ◽

Cited By ~ 1

Author(s):

Savita Sangam ◽

Subhash Shinde

Keyword(s):

Genetic Algorithm ◽

Social Media ◽

Feature Selection ◽

Opinion Mining ◽

Feature Selection Method ◽

Selection Method

Download Full-text

NICFS: A novel feature selection method applied to lexicon based sentiment analysis

Intelligent Decision Technologies ◽

10.3233/idt-190361 ◽

2019 ◽

Vol 13 (1) ◽

pp. 41-48

Author(s):

Poornima Mehta ◽

Satish Chandra

Keyword(s):

Feature Selection ◽

Sentiment Analysis ◽

Feature Selection Method ◽

Selection Method

Download Full-text

Towards Optimization of Malware Detection using Chi-square Feature Selection on Ensemble Classifiers

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.d2359.0410421 ◽

2021 ◽

Vol 10 (4) ◽

pp. 254-262

Author(s):

*Fadare Oluwaseun Gbenga ◽

Adetunmbi Adebayo Olusola ◽

(Mrs) Oyinloye Oghenerukevwe Eloho ◽

Mogaji Stephen Alaba

Keyword(s):

Feature Selection ◽

Malware Detection ◽

Feature Selection Method ◽

Ensemble Methods ◽

Nearest Neighbors ◽

Selection Method ◽

Gradient Boosting ◽

K Nearest Neighbors ◽

Chi Square ◽

Extreme Gradient Boosting

The multiplication of malware variations is probably the greatest problem in PC security and the protection of information in form of source code against unauthorized access is a central issue in computer security. In recent times, machine learning has been extensively researched for malware detection and ensemble technique has been established to be highly effective in terms of detection accuracy. This paper proposes a framework that combines combining the exploit of both Chi-square as the feature selection method and eight ensemble learning classifiers on five base learners- K-Nearest Neighbors, Naïve Bayes, Support Vector Machine, Decision Trees, and Logistic Regression. K-Nearest Neighbors returns the highest accuracy of 95.37%, 87.89% on chi-square, and without feature selection respectively. Extreme Gradient Boosting Classifier ensemble accuracy is the highest with 97.407%, 91.72% with Chi-square as feature selection, and ensemble methods without feature selection respectively. Extreme Gradient Boosting Classifier and Random Forest are leading in the seven evaluative measures of chi-square as a feature selection method and ensemble methods without feature selection respectively. The study results show that the tree-based ensemble model is compelling for malware classification.

Download Full-text

QER: a new feature selection method for sentiment analysis

Human-centric Computing and Information Sciences ◽

10.1186/s13673-018-0135-8 ◽

2018 ◽

Vol 8 (1) ◽

Cited By ~ 6

Author(s):

Tuba Parlar ◽

Selma Ayşe Özel ◽

Fei Song

Keyword(s):

Feature Selection ◽

Sentiment Analysis ◽

Feature Selection Method ◽

Selection Method ◽

New Feature

Download Full-text

Aspect term extraction for sentiment analysis in large movie reviews using Gini Index feature selection method and SVM classifier

World Wide Web ◽

10.1007/s11280-015-0381-x ◽

2016 ◽

Vol 20 (2) ◽

pp. 135-154 ◽

Cited By ~ 92

Author(s):

Asha S Manek ◽

P Deepa Shenoy ◽

M Chandra Mohan ◽

Venugopal K R

Keyword(s):

Feature Selection ◽

Sentiment Analysis ◽

Gini Index ◽

Feature Selection Method ◽

Selection Method ◽

Svm Classifier ◽

Term Extraction

Download Full-text

Development of an entropy-based feature selection method and analysis of online reviews on real estate

2017 IEEE International Conference on Industrial Engineering and Engineering Management (IEEM) ◽

10.1109/ieem.2017.8290312 ◽

2017 ◽

Cited By ~ 1

Author(s):

Hiroki Horino ◽

Hirofumi Nonaka ◽

Elisa Claire Aleman Carreon ◽

Toru Hiraoka

Keyword(s):

Feature Selection ◽

Real Estate ◽

Feature Selection Method ◽

Online Reviews ◽

Selection Method

Download Full-text

Aspect-Based Sentiment Analysis of Arabic Tweets in the Education Sector Using a Hybrid Feature Selection Method

2020 14th International Conference on Innovations in Information Technology (IIT) ◽

10.1109/iit50501.2020.9299026 ◽

2020 ◽

Author(s):

Manar Alassaf ◽

Ali Mustafa Qamar

Keyword(s):

Feature Selection ◽

Sentiment Analysis ◽

Feature Selection Method ◽

Selection Method ◽

Education Sector

Download Full-text

Sentiment Analysis of Chinese Micro Blog Using Machine Learning and an Improved Feature Selection Method

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.631-632.1219 ◽

2014 ◽

Vol 631-632 ◽

pp. 1219-1223

Author(s):

Jia Hao Chen ◽

Jian Hua Wu

Keyword(s):

Machine Learning ◽

Social Media ◽

Feature Selection ◽

Sentiment Analysis ◽

Rapid Development ◽

Feature Selection Method ◽

Selection Method ◽

Media Services ◽

Social Media Service ◽

Better Than

With the rapid development of Internet and occurrence of social media services, many users are becoming the creators of social information. However, the normal manual work can't deal with a large number of subjective messages. As a new kind of social media service, micro blog has been widely accepted and can be used for sentiment analysis. This paper compared performances of three machine learning methods on sentiment analysis of Chinese micro blog. We also proposed an improved feature selection method that increases the accuracy of classification. Experiment results show that SVM is closed to Naïve Bayes and they are better than logistic regression in most cases.

Download Full-text

Enhancement of Performance of K-Nearest Neighbors Classifiers for the Prediction of Diabetes Using Feature Selection Method

2020 IEEE 5th International Conference on Computing Communication and Automation (ICCCA) ◽

10.1109/iccca49541.2020.9250887 ◽

2020 ◽

Author(s):

Subhash Chandra Gupta ◽

Noopur Goel

Keyword(s):

Feature Selection ◽

Feature Selection Method ◽

Nearest Neighbors ◽

Selection Method ◽

K Nearest Neighbors ◽

Prediction Of Diabetes

Download Full-text