scholarly journals Text Generation with Content and Structure-Based Preprocessing in Imbalanced Data of Product Review

2021 ◽  
Vol 14 (1) ◽  
pp. 516-527
Author(s):  
Ana Zaqiyah ◽  
◽  
Diana Purwitasari ◽  
Chastine Fatichah ◽  
◽  
...  

Spam detection frequently categorizes product reviews as spam and non-spam. The spam reviews may contain texts of fake reviews and non-review statements describing unrelated things about products. Most of the publicly available spam reviews are labelled as fake reviews, while non-spam texts that are not fake reviews could contain non-review statements. It is crucial to notice those non-review statements since they convey misperception to consumers. Non-review statements are hardly found, and those statements of large and long texts often need to be manually labelled, which is time-consuming. Because of the rareness in finding non-review statements, there is an imbalanced condition between non-spam as a major class and spam that consists of the non-review statement as a minor class. Augmenting fake reviews to add spam texts is ineffective because they have similar content to non-spam such as some opinion words of product features. Thus, the text generation of non-review statements is preferable for adding spam texts. Some text generation issues are the frequent neural network-based methods require much learning data, and the existing pre-trained models produce texts with different contexts to non-review statements. The augmented texts should have similar content and context represented by the structure of the non-review statement. Therefore, we propose a text generation model with content and structure-based preprocessing to produce non-review statements, which is expected to overcome imbalanced data and give better spam detection results in product reviews. Structure-based preprocessing identifies the feature structures of non-opinion words from part-of-speech tags. Those features represent the context of spam reviews in unlabeled texts. Then, content-based preprocessing appoints selected topic modeling results of non-review statements from fake reviews. Our experiments resulted an improvement on the metric value of ± 0.04, called as BLEU (Bi-Lingual Evaluation Understudy) score, for the correspondence evaluation between generated and trained texts. The metric value indicates that the generated texts are not quite identical to the trained texts of non-review statements. However, those additional texts combined with the original spam texts gave better spam detection results with an increasing value of more than 40% on average recall score.

2014 ◽  
Vol 631-632 ◽  
pp. 1190-1193
Author(s):  
Sheng Xiu Yang ◽  
Lu Jie Fan

Online shopping reviews provide valuable information for customers to compare the quality of products, and many other aspects of future purchases. People increasingly rely on information from E-commerce reviews. Product reviews is an important determinant of potential customers’ buying choices. However, spammers are joining this community to try to mislead consumers by writing fake or unfair reviews to confuse the consumers. Fake product review detection makes an attempt to detect fake reviews and remove them to restore the truthful ones for readers. To the best of our knowledge, there is still less published study on this problem. In this paper, we make a survey and an attempt to give a brief overview on review spam. The related work of fake product review detection is presented including web spam and spam email. Then some methods to detect review spam are introduced and summarized. The trend of review spam detection is concluded finally.


Author(s):  
Dedy Suryadi ◽  
Harrison Kim

Online product reviews have become an efficient source to gather consumer needs, instead of going through the labor-intensive surveys. The contribution of the paper is to relate the content of online reviews to a product’s sales rank, that implicitly reflects the needs and motivation behind what drives customers to purchase the product. In particular, the review content includes product features stated in the review, together with the sentiment expressed towards the feature. Part-of-speech tagging is used to extract the features and sentiment from the reviews. The extracted data from reviews and price then subsequently become independent variables in the regression model, while sales rank is the dependent variable. An experiment is run for the wearable technology products to illustrate the methodology and interpret the results. In general, the features in reviews that are related to sales rank significantly are button, calorie tracker, design, time functions, and waterproof abilities. Moreover, the products are further stratified based on price average. In the cluster of the most expensive items, the sales rank is found to be not significantly related to price.


Author(s):  
Min Chen ◽  
Anusha Prabakaran

With the prevalence of e-commerce, online product reviews are increasingly considered crowd-sourced consumer opinions that significantly influence customer purchasing decisions and product rankings. It is therefore important to ensure the truthfulness of reviews by detecting and filtering out fake/spam reviews. This article presents an effective framework to analyze review credibility for spam detection and opinion mining. It incorporates three methods: duplicated review detection, anomaly detection, and incentivized review detection, that complement each other to produce statistical credibility scores indicating review credibility. A practical end-to-end system is designed and developed accordingly, and is equipped with high-level data visualization for easy interpretation and summarization of the analysis results. Experiments on an Amazon review dataset demonstrate its efficiency, scalability and accuracy. This system could help e-commerce and consumers identify fake reviews, refine product rankings, and constrain vendors and spammers from engaging in dishonest practices.


Author(s):  
Vinod Kumar Mishra ◽  
Himanshu Tiruwa

Sentiment analysis is a part of computational linguistics concerned with extracting sentiment and emotion from text. It is also considered as a task of natural language processing and data mining. Sentiment analysis mainly concentrate on identifying whether a given text is subjective or objective and if it is subjective, then whether it is negative, positive or neutral. This chapter provide an overview of aspect based sentiment analysis with current and future trend of research on aspect based sentiment analysis. This chapter also provide a aspect based sentiment analysis of online customer reviews of Nokia 6600. To perform aspect based classification we are using lexical approach on eclipse platform which classify the review as a positive, negative or neutral on the basis of features of product. The Sentiwordnet is used as a lexical resource to calculate the overall sentiment score of each sentence, pos tagger is used for part of speech tagging, frequency based method is used for extraction of the aspects/features and used negation handling for improving the accuracy of the system.


2019 ◽  
Vol 11 (3) ◽  
pp. 81-97
Author(s):  
Chao Li ◽  
Jun Xiang ◽  
Shiqiang Chen

Reviews can reflect the degree of consumers' satisfaction and views on product quality, and consumers tend to read product reviews and then get helpful information about product quality before placing an order in e-commerce platforms. However, the existing research mainly focus on the assessment of review quality, fake review detection, opinion mining, and there is little research to assess product quality from the perspectives of product features based on reviews objectively and quantifialy. Therefore, the authors propose a method to assess product quality based on reviews in a granularity of product feature. The authors define the related quality dimensions and develop the corresponding assessment models, assess the review quality crawled from an e-commerce platform, then extract product features and opinion words from the quality reviews, and finally assess product quality on the extracted and consumer-concerned features. Experiment results demonstrate the methodology can achieve the assessment of product quality on any feature objectively and quantificationally.


2019 ◽  
Vol 8 (5) ◽  
pp. 668 ◽  
Author(s):  
Yang Cao ◽  
Xin Fang ◽  
Johan Ottosson ◽  
Erik Näslund ◽  
Erik Stenberg

Background: Severe obesity is a global public health threat of growing proportions. Accurate models to predict severe postoperative complications could be of value in the preoperative assessment of potential candidates for bariatric surgery. So far, traditional statistical methods have failed to produce high accuracy. We aimed to find a useful machine learning (ML) algorithm to predict the risk for severe complication after bariatric surgery. Methods: We trained and compared 29 supervised ML algorithms using information from 37,811 patients that operated with a bariatric surgical procedure between 2010 and 2014 in Sweden. The algorithms were then tested on 6250 patients operated in 2015. We performed the synthetic minority oversampling technique tackling the issue that only 3% of patients experienced severe complications. Results: Most of the ML algorithms showed high accuracy (>90%) and specificity (>90%) in both the training and test data. However, none of the algorithms achieved an acceptable sensitivity in the test data. We also tried to tune the hyperparameters of the algorithms to maximize sensitivity, but did not yet identify one with a high enough sensitivity that can be used in clinical praxis in bariatric surgery. However, a minor, but perceptible, improvement in deep neural network (NN) ML was found. Conclusion: In predicting the severe postoperative complication among the bariatric surgery patients, ensemble algorithms outperform base algorithms. When compared to other ML algorithms, deep NN has the potential to improve the accuracy and it deserves further investigation. The oversampling technique should be considered in the context of imbalanced data where the number of the interested outcome is relatively small.


2016 ◽  
Vol 2016 ◽  
pp. 1-7 ◽  
Author(s):  
Chengai Sun ◽  
Qiaolin Du ◽  
Gang Tian

Product reviews are now widely used by individuals for making their decisions. However, due to the purpose of profit, reviewers game the system by posting fake reviews for promoting or demoting the target products. In the past few years, fake review detection has attracted significant attention from both the industrial organizations and academic communities. However, the issue remains to be a challenging problem due to lacking of labelling materials for supervised learning and evaluation. Current works made many attempts to address this problem from the angles of reviewer and review. However, there has been little discussion about the product related review features which is the main focus of our method. This paper proposes a novel convolutional neural network model to integrate the product related review features through a product word composition model. To reduce overfitting and high variance, a bagging model is introduced to bag the neural network model with two efficient classifiers. Experiments on the real-life Amazon review dataset demonstrate the effectiveness of the proposed approach.


Sign in / Sign up

Export Citation Format

Share Document