A machine learning-based sentiment analysis of online product reviews with a novel term weighting and feature selection approach

2021 ◽  
Vol 58 (5) ◽  
pp. 102656
Author(s):  
Huiliang Zhao ◽  
Zhenghong Liu ◽  
Xuemei Yao ◽  
Qin Yang
2021 ◽  
Vol 22 (1) ◽  
pp. 53-66
Author(s):  
D. Anand Joseph Daniel ◽  
M. Janaki Meena

Sentiment analysis of online product reviews has become a mainstream way for businesses on e-commerce platforms to promote their products and improve user satisfaction. Hence, it is necessary to construct an automatic sentiment analyser for automatic identification of sentiment polarity of the online product reviews. Traditional lexicon-based approaches used for sentiment analysis suffered from several accuracy issues while machine learning techniques require labelled training data. This paper introduces a hybrid sentiment analysis framework to bond the gap between both machine learning and lexicon-based approaches. A novel tunicate swarm algorithm (TSA) based feature reduction is integrated with the proposed hybrid method to solve the scalability issue that arises due to a large feature set. It reduces the feature set size to 43% without changing the accuracy (93%). Besides, it improves the scalability, reduces the computation time and enhances the overall performance of the proposed framework. From experimental analysis, it can be observed that TSA outperforms existing feature selection techniques such as particle swarm optimization and genetic algorithm. Moreover, the proposed approach is analysed with performance metrics such as recall, precision, F1-score, feature size and computation time.


2020 ◽  
Vol 10 (22) ◽  
pp. 8093
Author(s):  
Jun Wang ◽  
Yuanyuan Xu ◽  
Hengpeng Xu ◽  
Zhe Sun ◽  
Zhenglu Yang ◽  
...  

Feature selection has devoted a consistently great amount of effort to dimension reduction for various machine learning tasks. Existing feature selection models focus on selecting the most discriminative features for learning targets. However, this strategy is weak in handling two kinds of features, that is, the irrelevant and redundant ones, which are collectively referred to as noisy features. These features may hamper the construction of optimal low-dimensional subspaces and compromise the learning performance of downstream tasks. In this study, we propose a novel multi-label feature selection approach by embedding label correlations (dubbed ELC) to address these issues. Particularly, we extract label correlations for reliable label space structures and employ them to steer feature selection. In this way, label and feature spaces can be expected to be consistent and noisy features can be effectively eliminated. An extensive experimental evaluation on public benchmarks validated the superiority of ELC.


2020 ◽  
Vol 8 (6) ◽  
pp. 2862-2867

E-commerce is a website or mobile application platform that help people to buy products. Before purchasing the product, customer will decide to buy it or not by reading the review from previous buyer. There is a problem that there are a lot of review so it will take a long time for customer to read it all. This research will be using sentiment analysis method to classify the review data. Sentiment analysis or opinion mining is a machine learning approach to classify and analyse texts or documents about human’s sentiments, emotions, and opinions. In this research, sentiment analysis was used to classify product reviews from e-commerce websites into positive or negative classes. The results could be processed further and be used to summarize customers' opinions about a certain product without reading every single review. The goal of this research is to optimize classification performance by using feature selection technique. Terms Frequency-Inverse Document Frequency (TF-IDF) feature extraction, Backward Elimination feature selection, and five different classifiers (Naïve Bayes, Support Vector Machine, K-Nearest Neighbour, Decision Tree, Random Forest) were used in analysing the sentiment of the reviews. In this research, the dataset used are Indonesian language and classified into two classes(positive and negative). The best accuracy is achieved by using TF-IDF, Backward Elimination and Support Vector Machine (SVM) with a score of 85.97%, which increases by 7.91% if compared to the process without feature selection. Based on the results, Backward Elimination feature selection succeeded in improving all performance for all classifiers used in this research.


Sign in / Sign up

Export Citation Format

Share Document