scholarly journals Using XGBoost and Skip-Gram Model to Predict Online Review Popularity

SAGE Open ◽  
2020 ◽  
Vol 10 (4) ◽  
pp. 215824402098331
Author(s):  
Lien Thi Kim Nguyen ◽  
Hao-Hsuan Chung ◽  
Kristine Velasquez Tuliao ◽  
Tom M. Y. Lin

Review popularity is similar to awareness and information accessibility components: Both have a profound effect on customer purchase decisions. Therefore, this study proposes a new method for predicting online review popularity that combines the extreme gradient boosting tree algorithm (XGBoost), to extract key features on the bases of ranking scores and the skip-gram model, which can subsequently identify semantic words according to key textual terms. Findings revealed that written reviews had higher review popularity than non-textual reviews (reviewer and product factors). Moreover, the proposed method achieved higher prediction accuracy than the traditional ridge regression technique of Root Mean Squared Logarithmic Error (RMSLE). The main factors affecting review popularity and key reviewers for specific textual terms were also identified. Findings could help vendors identify key influencers for their product promotion and then support the design of word-suggestion systems for online reviews.

2021 ◽  
Vol 29 (6) ◽  
pp. 0-0

Online review is a crucial display content of many online shopping platforms and an essential source of product information for consumers. Low-quality reviews often cause inconvenience to the platform and review readers. This article aims to help Steam, one of the largest digital distribution platforms, predict the review helpfulness and funniness. Via Python, 480,000 game reviews related data for 20 games were captured for analysis. This article analyzed the impact of three categories of influencing factors on the usefulness and funniness of game reviews, which are characteristics of review, reviewer and game. Additionally, by using the Random Forest-based classifier, the usefulness of reviews could be accurately predicted, while for funniness, Gradient Boosting Decision Tree was the better choice. This article applied research on the usefulness of reviews to game products and proposed research on the funniness of reviews.


2020 ◽  
Vol 16 (3) ◽  
pp. 466-490
Author(s):  
Yuan Meng ◽  
Nianhua Yang ◽  
Zhilin Qian ◽  
Gaoyu Zhang

Online product reviews play important roles in the word-of-mouth marketing of e-commerce enterprises, but only helpful reviews actually influence customers’ purchase decisions. Current research focuses on how to predict the helpfulness of a review but lacks a thorough analysis of why it is helpful. In this paper, feature sets covering review text and context cues are firstly proposed to represent review helpfulness. Then, a set of gradient boosted trees (GBT) models is introduced, and the optimal one, which as implemented in eXtreme Gradient Boosting (XGBoost), is chosen to predict and explain review helpfulness. Specially, by including the SHAP (Shapley) values method to quantify feature contribution, this paper presents an integrated framework to better interpret why a review is helpful at both the macro and micro levels. Based on real data from Amazon.cn, this paper reveals that the number of words contributes the most to the helpfulness of reviews on headsets and is interactively influenced by features like the number of sentences or feature frequency, while feature frequency contributes the most to the helpfulness of facial cleanser reviews and is interactively influenced by the number of adjectives used in the review or the review’s entropy. Both datasets show that individual feature contributions vary from review to review, and individual joint contributions gradually decrease with the increase of feature values.


2021 ◽  
Vol 29 (6) ◽  
pp. 0-0

Online reviews have emerged as influential sources of information which greatly affect customers’ pre-purchase decision. Some studies have found that culture impacts online reviews, but many aspects of online review usage are still not well-understood. This study seeks to understand: What factors influence the usage of online reviews and consumers’ intention to use online reviews influenced by culture? This study collects data from U.S. and Thai consumers to examine what factors affect user attitudes and intentions. Structural Equation Modeling is used to analyze the data and the findings reveal that most of the proposed factors influence online review adoption for these two nationalities. One significant difference was found between the respondents of the two countries. The results should help online businesses gain a better understanding of these factors, and thus direct their efforts to develop features which positively influence online review usage.


Author(s):  
He Yang ◽  
Emma Li ◽  
Yi Fang Cai ◽  
Jiapei Li ◽  
George X. Yuan

The purpose of this paper is to establish a framework for the extraction of early warning risk features for the predicting financial distress based on XGBoost model and SHAP. It is well known that the way to construct early warning risk features to predict financial distress of companies is very important, and by comparing with the traditional statistical methods, though the data-driven machine learning for the financial early warning, modelling has a better performance in terms of prediction accuracy, but it also brings the difficulty such as the one the corresponding model may be not explained well. Recently, eXtreme Gradient Boosting (XGBoost), an ensemble learning algorithm based on extreme gradient boosting, has become a hot topic in the area of machine learning research field due to its strong nonlinear information recognition ability and high prediction accuracy in the practice. In this study, the XGBoost algorithm is used to extract early warning features for the predicting financial distress for listed companies, with 76 financial risk features from seven categories of aspects, and 14 non-financial risk features from four categories of aspects, which are collected to establish an early warning system for the predication of financial distress. With applications, we conduct the empirical testing respect to AUC, KS and Kappa, the numerical results show that by comparing with the Logistic model, our method based on XGBoost model established in this paper has much better ability to predict the financial distress risk of listed companies. Moreover, under the framework of SHAP (SHAPley Additive exPlanations), we are able to give a reasonable explanation for important risk features and influencing ways affecting the financial distress visibly. The results given by this paper show that the XGBoost approach to model early warning features for financial distress does not only preform a better prediction accuracy, but also is explainable, which is significant for the identification of early warning to the financial distress risk for listed companies in the practice.


Author(s):  
Sarah Barber ◽  
Florian Hammer ◽  
Adrian Tica

Abstract Data-driven wind turbine performance predictions, such as power and loads, are important for planning and operation. Current methods do not take site-specific conditions such as turbulence intensity and shear into account, which could result in errors of up to 10%. In this work, four different machine learning models (k-nearest neighbors regression, random forest regression, extreme gradient boosting regression and artificial neural networks (ANN) are trained and tested, firstly on a simulation dataset and then on a real dataset. It is found that machine learning methods that take site-specific conditions into account can improve prediction accuracy by a factor of two to three, depening on the error indicator chosen. Similar results are observed for multi-output ANNs for simulated in- and out-of-plane rotor blade tip deflection and root loads. Future work focuses on understanding transferability of results between different turbines within a wind farm and between different wind turbine types.


Symmetry ◽  
2020 ◽  
Vol 12 (9) ◽  
pp. 1566 ◽  
Author(s):  
Zeinab Shahbazi ◽  
Debapriya Hazra ◽  
Sejoon Park ◽  
Yung Cheol Byun

With the spread of COVID-19, the “untact” culture in South Korea is expanding and customers are increasingly seeking for online services. A recommendation system serves as a decision-making indicator that helps users by suggesting items to be purchased in the future by exploring the symmetry between multiple user activity characteristics. A plethora of approaches are employed by the scientific community to design recommendation systems, including collaborative filtering, stereotyping, and content-based filtering, etc. The current paradigm of recommendation systems favors collaborative filtering due to its significant potential to closely capture the interest of a user as compared to other approaches. The collaborative filtering harnesses features like user-profile details, visited pages, and click information to determine the interest of a user, thereby recommending the items that are related to the user’s interest. The existing collaborative filtering approaches exploit implicit and explicit features and report either good classification or prediction outcome. These systems fail to exhibit good results for both measures at the same time. We believe that avoiding the recommendation of those items that have already been purchased could contribute to overcoming the said issue. In this study, we present a collaborative filtering-based algorithm to tackle big data of user with symmetric purchasing order and repetitive purchased products. The proposed algorithm relies on combining extreme gradient boosting machine learning architecture with word2vec mechanism to explore the purchased products based on the click patterns of users. Our algorithm improves the accuracy of predicting the relevant products to be recommended to the customers that are likely to be bought. The results are evaluated on the dataset that contains click-based features of users from an online shopping mall in Jeju Island, South Korea. We have evaluated Mean Absolute Error, Mean Square Error, and Root Mean Square Error for our proposed methodology and also other machine learning algorithms. Our proposed model generated the least error rate and enhanced the prediction accuracy of the recommendation system compared to other traditional approaches.


Information ◽  
2021 ◽  
Vol 12 (11) ◽  
pp. 486
Author(s):  
Xiaoyan Zhang ◽  
Qiang Yan ◽  
Simin Zhou ◽  
Linye Ma ◽  
Siran Wang

The number of consumers playing virtual reality games is booming. To speed up product iteration, the user experience team needs to collect and analyze unsatisfying experiences in time. In this paper, we aim to detect the unsatisfying experiences hidden in online reviews of virtual reality exergames using a deep learning method and find out the unmet psychological needs of users based on self-determination theory. Convolutional neural networks for sentence classification (textCNN) are used in this study to classify online reviews with unsatisfying experiences. For comparison, we set eXtreme gradient boosting (XGBoost) with lexical features as the baseline of machine learning. Term frequency-inverse document frequency (TF-IDF) is used to extract keywords from every set of classified reviews. The micro-F1 score of textCNN classifier is 90.00, which is better than 82.69 of XGBoost. The top 10 keywords of every set of reviews reflect relevant topics of unmet psychological needs. This paper explores the potential problems causing unsatisfying experiences and unmet psychological needs in virtual reality exergames through text mining and makes a supplement for experimental studies about virtual reality exergames.


2019 ◽  
Vol 9 (24) ◽  
pp. 5447 ◽  
Author(s):  
João Rala Cordeiro ◽  
Octavian Postolache ◽  
João C. Ferreira

This study is a contribution for the improvement of healthcare in children and in society generally. This study aims to predict children’s height when they become adults, also known as “target height”, to allow for a better growth assessment and more personalized healthcare. The existing literature describes some existing prediction methods, based on longitudinal population studies and statistical techniques, which with few information resources, are able to produce acceptable results. The challenge of this study is in using a new approach based on machine learning to forecast the target height for children and (eventually) improve the existing height prediction accuracy. The goals of the study were achieved. The extreme gradient boosting regression (XGB) and light gradient boosting machine regression (LightGBM) algorithms achieved considerably better results on the height prediction. The developed model can be usefully applied by pediatricians and other clinical professionals in growth assessment.


2021 ◽  
Vol 29 (6) ◽  
pp. 0-0

Online review is a crucial display content of many online shopping platforms and an essential source of product information for consumers. Low-quality reviews often cause inconvenience to the platform and review readers. This article aims to help Steam, one of the largest digital distribution platforms, predict the review helpfulness and funniness. Via Python, 480,000 game reviews related data for 20 games were captured for analysis. This article analyzed the impact of three categories of influencing factors on the usefulness and funniness of game reviews, which are characteristics of review, reviewer and game. Additionally, by using the Random Forest-based classifier, the usefulness of reviews could be accurately predicted, while for funniness, Gradient Boosting Decision Tree was the better choice. This article applied research on the usefulness of reviews to game products and proposed research on the funniness of reviews.


2021 ◽  
Vol 29 (6) ◽  
pp. 1-23
Author(s):  
Zhi Wang ◽  
Victor Chang ◽  
Gergely Horvath

Online review is a crucial display content of many online shopping platforms and an essential source of product information for consumers. Low-quality reviews often cause inconvenience to the platform and review readers. This article aims to help Steam, one of the largest digital distribution platforms, predict the review helpfulness and funniness. Via Python, 480,000 game reviews related data for 20 games were captured for analysis. This article analyzed the impact of three categories of influencing factors on the usefulness and funniness of game reviews, which are characteristics of review, reviewer and game. Additionally, by using the Random Forest-based classifier, the usefulness of reviews could be accurately predicted, while for funniness, Gradient Boosting Decision Tree was the better choice. This article applied research on the usefulness of reviews to game products and proposed research on the funniness of reviews.


Sign in / Sign up

Export Citation Format

Share Document