Using XGBoost and Skip-Gram Model to Predict Online Review Popularity

Online review is a crucial display content of many online shopping platforms and an essential source of product information for consumers. Low-quality reviews often cause inconvenience to the platform and review readers. This article aims to help Steam, one of the largest digital distribution platforms, predict the review helpfulness and funniness. Via Python, 480,000 game reviews related data for 20 games were captured for analysis. This article analyzed the impact of three categories of influencing factors on the usefulness and funniness of game reviews, which are characteristics of review, reviewer and game. Additionally, by using the Random Forest-based classifier, the usefulness of reviews could be accurately predicted, while for funniness, Gradient Boosting Decision Tree was the better choice. This article applied research on the usefulness of reviews to game products and proposed research on the funniness of reviews.

Download Full-text

What Makes an Online Review More Helpful: An Interpretation Framework Using XGBoost and SHAP Values

Journal of theoretical and applied electronic commerce research ◽

10.3390/jtaer16030029 ◽

2020 ◽

Vol 16 (3) ◽

pp. 466-490

Author(s):

Yuan Meng ◽

Nianhua Yang ◽

Zhilin Qian ◽

Gaoyu Zhang

Keyword(s):

Real Data ◽

Gradient Boosting ◽

Product Reviews ◽

Purchase Decisions ◽

Extreme Gradient Boosting ◽

Shapley Values ◽

Review Helpfulness ◽

Feature Values ◽

Context Cues ◽

Feature Frequency

Online product reviews play important roles in the word-of-mouth marketing of e-commerce enterprises, but only helpful reviews actually influence customers’ purchase decisions. Current research focuses on how to predict the helpfulness of a review but lacks a thorough analysis of why it is helpful. In this paper, feature sets covering review text and context cues are firstly proposed to represent review helpfulness. Then, a set of gradient boosted trees (GBT) models is introduced, and the optimal one, which as implemented in eXtreme Gradient Boosting (XGBoost), is chosen to predict and explain review helpfulness. Specially, by including the SHAP (Shapley) values method to quantify feature contribution, this paper presents an integrated framework to better interpret why a review is helpful at both the macro and micro levels. Based on real data from Amazon.cn, this paper reveals that the number of words contributes the most to the helpfulness of reviews on headsets and is interactively influenced by features like the number of sentences or feature frequency, while feature frequency contributes the most to the helpfulness of facial cleanser reviews and is interactively influenced by the number of adjectives used in the review or the review’s entropy. Both datasets show that individual feature contributions vary from review to review, and individual joint contributions gradually decrease with the increase of feature values.

Download Full-text

Using Customer Review Systems to Support Purchase Decisions

Journal of Global Information Management ◽

10.4018/jgim.20211101oa60 ◽

2021 ◽

Vol 29 (6) ◽

pp. 0-0

Keyword(s):

Structural Equation ◽

Online Reviews ◽

Equation Modeling ◽

Online Review ◽

Purchase Decision ◽

Intention To Use ◽

Purchase Decisions ◽

Sources Of Information ◽

User Attitudes ◽

Significant Difference

Online reviews have emerged as influential sources of information which greatly affect customers’ pre-purchase decision. Some studies have found that culture impacts online reviews, but many aspects of online review usage are still not well-understood. This study seeks to understand: What factors influence the usage of online reviews and consumers’ intention to use online reviews influenced by culture? This study collects data from U.S. and Thai consumers to examine what factors affect user attitudes and intentions. Structural Equation Modeling is used to analyze the data and the findings reveal that most of the proposed factors influence online review adoption for these two nationalities. One significant difference was found between the respondents of the two countries. The results should help online businesses gain a better understanding of these factors, and thus direct their efforts to develop features which positively influence online review usage.

Download Full-text

The extraction of early warning features for the predicting financial distress based on XGboost model and shap framework

International Journal of Financial Engineering ◽

10.1142/s2424786321410048 ◽

2021 ◽

pp. 2141004

Author(s):

He Yang ◽

Emma Li ◽

Yi Fang Cai ◽

Jiapei Li ◽

George X. Yuan

Keyword(s):

Machine Learning ◽

Early Warning ◽

Financial Distress ◽

Prediction Accuracy ◽

Financial Risk ◽

Learning Algorithm ◽

Listed Companies ◽

Gradient Boosting ◽

Distress Risk ◽

Extreme Gradient Boosting

The purpose of this paper is to establish a framework for the extraction of early warning risk features for the predicting financial distress based on XGBoost model and SHAP. It is well known that the way to construct early warning risk features to predict financial distress of companies is very important, and by comparing with the traditional statistical methods, though the data-driven machine learning for the financial early warning, modelling has a better performance in terms of prediction accuracy, but it also brings the difficulty such as the one the corresponding model may be not explained well. Recently, eXtreme Gradient Boosting (XGBoost), an ensemble learning algorithm based on extreme gradient boosting, has become a hot topic in the area of machine learning research field due to its strong nonlinear information recognition ability and high prediction accuracy in the practice. In this study, the XGBoost algorithm is used to extract early warning features for the predicting financial distress for listed companies, with 76 financial risk features from seven categories of aspects, and 14 non-financial risk features from four categories of aspects, which are collected to establish an early warning system for the predication of financial distress. With applications, we conduct the empirical testing respect to AUC, KS and Kappa, the numerical results show that by comparing with the Logistic model, our method based on XGBoost model established in this paper has much better ability to predict the financial distress risk of listed companies. Moreover, under the framework of SHAP (SHAPley Additive exPlanations), we are able to give a reasonable explanation for important risk features and influencing ways affecting the financial distress visibly. The results given by this paper show that the XGBoost approach to model early warning features for financial distress does not only preform a better prediction accuracy, but also is explainable, which is significant for the identification of early warning to the financial distress risk for listed companies in the practice.

Download Full-text

Improving Site-Dependent Wind Turbine Performance Prediction Accuracy Using Machine Learning

ASCE-ASME J Risk and Uncert in Engrg Sys Part B Mech Engrg ◽

10.1115/1.4053513 ◽

2022 ◽

Author(s):

Sarah Barber ◽

Florian Hammer ◽

Adrian Tica

Keyword(s):

Machine Learning ◽

Wind Turbine ◽

Prediction Accuracy ◽

Wind Farm ◽

Gradient Boosting ◽

K Nearest Neighbors ◽

Site Specific ◽

Turbine Performance ◽

Extreme Gradient Boosting ◽

Out Of Plane

Abstract Data-driven wind turbine performance predictions, such as power and loads, are important for planning and operation. Current methods do not take site-specific conditions such as turbulence intensity and shear into account, which could result in errors of up to 10%. In this work, four different machine learning models (k-nearest neighbors regression, random forest regression, extreme gradient boosting regression and artificial neural networks (ANN) are trained and tested, firstly on a simulation dataset and then on a real dataset. It is found that machine learning methods that take site-specific conditions into account can improve prediction accuracy by a factor of two to three, depening on the error indicator chosen. Similar results are observed for multi-output ANNs for simulated in- and out-of-plane rotor blade tip deflection and root loads. Future work focuses on understanding transferability of results between different turbines within a wind farm and between different wind turbine types.

Download Full-text

Toward Improving the Prediction Accuracy of Product Recommendation System Using Extreme Gradient Boosting and Encoding Approaches

Symmetry ◽

10.3390/sym12091566 ◽

2020 ◽

Vol 12 (9) ◽

pp. 1566 ◽

Cited By ~ 2

Author(s):

Zeinab Shahbazi ◽

Debapriya Hazra ◽

Sejoon Park ◽

Yung Cheol Byun

Keyword(s):

Machine Learning ◽

South Korea ◽

Collaborative Filtering ◽

Mean Square Error ◽

Prediction Accuracy ◽

Recommendation System ◽

Recommendation Systems ◽

Gradient Boosting ◽

Mean Square ◽

Extreme Gradient Boosting

With the spread of COVID-19, the “untact” culture in South Korea is expanding and customers are increasingly seeking for online services. A recommendation system serves as a decision-making indicator that helps users by suggesting items to be purchased in the future by exploring the symmetry between multiple user activity characteristics. A plethora of approaches are employed by the scientific community to design recommendation systems, including collaborative filtering, stereotyping, and content-based filtering, etc. The current paradigm of recommendation systems favors collaborative filtering due to its significant potential to closely capture the interest of a user as compared to other approaches. The collaborative filtering harnesses features like user-profile details, visited pages, and click information to determine the interest of a user, thereby recommending the items that are related to the user’s interest. The existing collaborative filtering approaches exploit implicit and explicit features and report either good classification or prediction outcome. These systems fail to exhibit good results for both measures at the same time. We believe that avoiding the recommendation of those items that have already been purchased could contribute to overcoming the said issue. In this study, we present a collaborative filtering-based algorithm to tackle big data of user with symmetric purchasing order and repetitive purchased products. The proposed algorithm relies on combining extreme gradient boosting machine learning architecture with word2vec mechanism to explore the purchased products based on the click patterns of users. Our algorithm improves the accuracy of predicting the relevant products to be recommended to the customers that are likely to be bought. The results are evaluated on the dataset that contains click-based features of users from an online shopping mall in Jeju Island, South Korea. We have evaluated Mean Absolute Error, Mean Square Error, and Root Mean Square Error for our proposed methodology and also other machine learning algorithms. Our proposed model generated the least error rate and enhanced the prediction accuracy of the recommendation system compared to other traditional approaches.

Download Full-text

Analysis of Unsatisfying User Experiences and Unmet Psychological Needs for Virtual Reality Exergames Using Deep Learning Approach

Information ◽

10.3390/info12110486 ◽

2021 ◽

Vol 12 (11) ◽

pp. 486

Author(s):

Xiaoyan Zhang ◽

Qiang Yan ◽

Simin Zhou ◽

Linye Ma ◽

Siran Wang

Keyword(s):

Virtual Reality ◽

Deep Learning ◽

Experimental Studies ◽

Online Reviews ◽

Psychological Needs ◽

Gradient Boosting ◽

Inverse Document Frequency ◽

Document Frequency ◽

Extreme Gradient Boosting ◽

Speed Up

The number of consumers playing virtual reality games is booming. To speed up product iteration, the user experience team needs to collect and analyze unsatisfying experiences in time. In this paper, we aim to detect the unsatisfying experiences hidden in online reviews of virtual reality exergames using a deep learning method and find out the unmet psychological needs of users based on self-determination theory. Convolutional neural networks for sentence classification (textCNN) are used in this study to classify online reviews with unsatisfying experiences. For comparison, we set eXtreme gradient boosting (XGBoost) with lexical features as the baseline of machine learning. Term frequency-inverse document frequency (TF-IDF) is used to extract keywords from every set of classified reviews. The micro-F1 score of textCNN classifier is 90.00, which is better than 82.69 of XGBoost. The top 10 keywords of every set of reviews reflect relevant topics of unmet psychological needs. This paper explores the potential problems causing unsatisfying experiences and unmet psychological needs in virtual reality exergames through text mining and makes a supplement for experimental studies about virtual reality exergames.

Download Full-text

Child’s Target Height Prediction Evolution

Applied Sciences ◽

10.3390/app9245447 ◽

2019 ◽

Vol 9 (24) ◽

pp. 5447 ◽

Cited By ~ 1

Author(s):

João Rala Cordeiro ◽

Octavian Postolache ◽

João C. Ferreira

Keyword(s):

Prediction Accuracy ◽

Population Studies ◽

Gradient Boosting ◽

Target Height ◽

New Approach ◽

Light Gradient ◽

Gradient Boosting Machine ◽

Extreme Gradient Boosting ◽

Height Prediction ◽

Growth Assessment

This study is a contribution for the improvement of healthcare in children and in society generally. This study aims to predict children’s height when they become adults, also known as “target height”, to allow for a better growth assessment and more personalized healthcare. The existing literature describes some existing prediction methods, based on longitudinal population studies and statistical techniques, which with few information resources, are able to produce acceptable results. The challenge of this study is in using a new approach based on machine learning to forecast the target height for children and (eventually) improve the existing height prediction accuracy. The goals of the study were achieved. The extreme gradient boosting regression (XGB) and light gradient boosting machine regression (LightGBM) algorithms achieved considerably better results on the height prediction. The developed model can be usefully applied by pediatricians and other clinical professionals in growth assessment.

Download Full-text

Explaining and Predicting Helpfulness and Funniness of Online Reviews on the Steam Platform

Journal of Global Information Management ◽

10.4018/jgim.20211101oa29 ◽

2021 ◽

Vol 29 (6) ◽

pp. 0-0

Keyword(s):

Applied Research ◽

Online Shopping ◽

Product Information ◽

Online Reviews ◽

Online Review ◽

Gradient Boosting ◽

Related Data ◽

Review Helpfulness ◽

Display Content ◽

The Impact

Online review is a crucial display content of many online shopping platforms and an essential source of product information for consumers. Low-quality reviews often cause inconvenience to the platform and review readers. This article aims to help Steam, one of the largest digital distribution platforms, predict the review helpfulness and funniness. Via Python, 480,000 game reviews related data for 20 games were captured for analysis. This article analyzed the impact of three categories of influencing factors on the usefulness and funniness of game reviews, which are characteristics of review, reviewer and game. Additionally, by using the Random Forest-based classifier, the usefulness of reviews could be accurately predicted, while for funniness, Gradient Boosting Decision Tree was the better choice. This article applied research on the usefulness of reviews to game products and proposed research on the funniness of reviews.

Download Full-text

Explaining and Predicting Helpfulness and Funniness of Online Reviews on the Steam Platform

Journal of Global Information Management ◽

10.4018/jgim.20211101.oa16 ◽

2021 ◽

Vol 29 (6) ◽

pp. 1-23

Author(s):

Zhi Wang ◽

Victor Chang ◽

Gergely Horvath

Keyword(s):

Applied Research ◽

Online Shopping ◽

Product Information ◽

Online Reviews ◽

Online Review ◽

Gradient Boosting ◽

Related Data ◽

Review Helpfulness ◽

Display Content ◽

The Impact

Online review is a crucial display content of many online shopping platforms and an essential source of product information for consumers. Low-quality reviews often cause inconvenience to the platform and review readers. This article aims to help Steam, one of the largest digital distribution platforms, predict the review helpfulness and funniness. Via Python, 480,000 game reviews related data for 20 games were captured for analysis. This article analyzed the impact of three categories of influencing factors on the usefulness and funniness of game reviews, which are characteristics of review, reviewer and game. Additionally, by using the Random Forest-based classifier, the usefulness of reviews could be accurately predicted, while for funniness, Gradient Boosting Decision Tree was the better choice. This article applied research on the usefulness of reviews to game products and proposed research on the funniness of reviews.

Download Full-text