scholarly journals Toward Improving the Prediction Accuracy of Product Recommendation System Using Extreme Gradient Boosting and Encoding Approaches

Symmetry ◽  
2020 ◽  
Vol 12 (9) ◽  
pp. 1566 ◽  
Author(s):  
Zeinab Shahbazi ◽  
Debapriya Hazra ◽  
Sejoon Park ◽  
Yung Cheol Byun

With the spread of COVID-19, the “untact” culture in South Korea is expanding and customers are increasingly seeking for online services. A recommendation system serves as a decision-making indicator that helps users by suggesting items to be purchased in the future by exploring the symmetry between multiple user activity characteristics. A plethora of approaches are employed by the scientific community to design recommendation systems, including collaborative filtering, stereotyping, and content-based filtering, etc. The current paradigm of recommendation systems favors collaborative filtering due to its significant potential to closely capture the interest of a user as compared to other approaches. The collaborative filtering harnesses features like user-profile details, visited pages, and click information to determine the interest of a user, thereby recommending the items that are related to the user’s interest. The existing collaborative filtering approaches exploit implicit and explicit features and report either good classification or prediction outcome. These systems fail to exhibit good results for both measures at the same time. We believe that avoiding the recommendation of those items that have already been purchased could contribute to overcoming the said issue. In this study, we present a collaborative filtering-based algorithm to tackle big data of user with symmetric purchasing order and repetitive purchased products. The proposed algorithm relies on combining extreme gradient boosting machine learning architecture with word2vec mechanism to explore the purchased products based on the click patterns of users. Our algorithm improves the accuracy of predicting the relevant products to be recommended to the customers that are likely to be bought. The results are evaluated on the dataset that contains click-based features of users from an online shopping mall in Jeju Island, South Korea. We have evaluated Mean Absolute Error, Mean Square Error, and Root Mean Square Error for our proposed methodology and also other machine learning algorithms. Our proposed model generated the least error rate and enhanced the prediction accuracy of the recommendation system compared to other traditional approaches.

2020 ◽  
Vol 39 (5) ◽  
pp. 7605-7620 ◽  
Author(s):  
Dhivya Elavarasan ◽  
Durai Raj Vincent

The development in science and technical intelligence has incited to represent an extensive amount ofdata from various fields of agriculture. Therefore an objective rises up for the examination of the available data and integrating with processes like crop enhancement, yield prediction, examination of plant infections etc. Machine learning has up surged with tremendous processing techniques to perceive new contingencies in the multi-disciplinary agrarian advancements. In this pa- per a novel hybrid regression algorithm, reinforced extreme gradient boosting is proposed which displays essentially improved execution over traditional machine learning algorithms like artificial neural networks, deep Q-Network, gradient boosting, ran- dom forest and decision tree. Extreme gradient boosting constructs new models, which are essentially, decision trees learning from the mistakes of their predecessors by optimizing the gradient descent loss function. The proposed hybrid model performs reinforcement learning at every node during the node splitting process of the decision tree construction. This leads to effective utilizationofthesamplesbyselectingtheappropriatesplitattributeforenhancedperformance. Model’sperformanceisevaluated by means of Mean Square Error, Root Mean Square Error, Mean Absolute Error, and Coefficient of Determination. To assure a fair assessment of the results, the model assessment is performed on both training and test dataset. The regression diagnostic plots from residuals and the results obtained evidently delineates the fact that proposed hybrid approach performs better with reduced error measure and improved accuracy of 94.15% over the other machine learning algorithms. Also the performance of probability density function for the proposed model delineates that, it can preserve the actual distributional characteristics of the original crop yield data more approximately when compared to the other experimented machine learning models.


2018 ◽  
Vol 12 (2) ◽  
pp. 85-98 ◽  
Author(s):  
Barry E King ◽  
Jennifer L Rice ◽  
Julie Vaughan

Research predicting National Hockey League average attendance is presented. The seasons examined are the 2013 hockey season through the beginning of the 2017 hockey season. Multiple linear regression and three machine learning algorithms – random forest, M5 prime, and extreme gradient boosting – are employed to predict out-of-sample average home game attendance. Extreme gradient boosting generated the lowest out-of-sample root mean square error.  The team identifier (team name), the number of Twitter followers (a surrogate for team popularity), median ticket price, and arena capacity have appeared as the top four predictor variables. 


2019 ◽  
Vol 13 ◽  
pp. 267-271
Author(s):  
Jacek Bielecki ◽  
Oskar Ceglarski ◽  
Maria Skublewska-Paszkowska

Recommendation systems are class of information filter applications whose main goal is to provide personalized recommendations. The main goal of the research was to compare two ways of creating personalized recommendations. The recommendation system was built on the basis of a content-based cognitive filtering method and on the basis of a collaborative filtering method based on user ratings. The conclusions of the research show the advantages and disadvantages of both methods.


Author(s):  
He Yang ◽  
Emma Li ◽  
Yi Fang Cai ◽  
Jiapei Li ◽  
George X. Yuan

The purpose of this paper is to establish a framework for the extraction of early warning risk features for the predicting financial distress based on XGBoost model and SHAP. It is well known that the way to construct early warning risk features to predict financial distress of companies is very important, and by comparing with the traditional statistical methods, though the data-driven machine learning for the financial early warning, modelling has a better performance in terms of prediction accuracy, but it also brings the difficulty such as the one the corresponding model may be not explained well. Recently, eXtreme Gradient Boosting (XGBoost), an ensemble learning algorithm based on extreme gradient boosting, has become a hot topic in the area of machine learning research field due to its strong nonlinear information recognition ability and high prediction accuracy in the practice. In this study, the XGBoost algorithm is used to extract early warning features for the predicting financial distress for listed companies, with 76 financial risk features from seven categories of aspects, and 14 non-financial risk features from four categories of aspects, which are collected to establish an early warning system for the predication of financial distress. With applications, we conduct the empirical testing respect to AUC, KS and Kappa, the numerical results show that by comparing with the Logistic model, our method based on XGBoost model established in this paper has much better ability to predict the financial distress risk of listed companies. Moreover, under the framework of SHAP (SHAPley Additive exPlanations), we are able to give a reasonable explanation for important risk features and influencing ways affecting the financial distress visibly. The results given by this paper show that the XGBoost approach to model early warning features for financial distress does not only preform a better prediction accuracy, but also is explainable, which is significant for the identification of early warning to the financial distress risk for listed companies in the practice.


Author(s):  
Sarah Barber ◽  
Florian Hammer ◽  
Adrian Tica

Abstract Data-driven wind turbine performance predictions, such as power and loads, are important for planning and operation. Current methods do not take site-specific conditions such as turbulence intensity and shear into account, which could result in errors of up to 10%. In this work, four different machine learning models (k-nearest neighbors regression, random forest regression, extreme gradient boosting regression and artificial neural networks (ANN) are trained and tested, firstly on a simulation dataset and then on a real dataset. It is found that machine learning methods that take site-specific conditions into account can improve prediction accuracy by a factor of two to three, depening on the error indicator chosen. Similar results are observed for multi-output ANNs for simulated in- and out-of-plane rotor blade tip deflection and root loads. Future work focuses on understanding transferability of results between different turbines within a wind farm and between different wind turbine types.


2021 ◽  
Vol 11 (17) ◽  
pp. 7793
Author(s):  
Alessandro Massaro ◽  
Antonio Panarese ◽  
Daniele Giannone ◽  
Angelo Galiano

The organized large-scale retail sector has been gradually establishing itself around the world, and has increased activities exponentially in the pandemic period. This modern sales system uses Data Mining technologies processing precious information to increase profit. In this direction, the extreme gradient boosting (XGBoost) algorithm was applied in an industrial project as a supervised learning algorithm to predict product sales including promotion condition and a multiparametric analysis. The implemented XGBoost model was trained and tested by the use of the Augmented Data (AD) technique in the event that the available data are not sufficient to achieve the desired accuracy, as for many practical cases of artificial intelligence data processing, where a large dataset is not available. The prediction was applied to a grid of segmented customers by allowing personalized services according to their purchasing behavior. The AD technique conferred a good accuracy if compared with results adopting the initial dataset with few records. An improvement of the prediction error, such as the Root Mean Square Error (RMSE) and Mean Square Error (MSE), which decreases by about an order of magnitude, was achieved. The AD technique formulated for large-scale retail sector also represents a good way to calibrate the training model.


2017 ◽  
Vol 44 (3) ◽  
pp. 331-344 ◽  
Author(s):  
Youdong Yun ◽  
Danial Hooshyar ◽  
Jaechoon Jo ◽  
Heuiseok Lim

The most commonly used algorithm in recommendation systems is collaborative filtering. However, despite its wide use, the prediction accuracy of this algorithm is unexceptional. Furthermore, whether quantitative data such as product rating or purchase history reflect users’ actual taste is questionable. In this article, we propose a method to utilise user review data extracted with opinion mining for product recommendation systems. To evaluate the proposed method, we perform product recommendation test on Amazon product data, with and without the additional opinion mining result on Amazon purchase review data. The performances of these two variants are compared by means of precision, recall, true positive recommendation (TPR) and false positive recommendation (FPR). In this comparison, a large improvement in prediction accuracy was observed when the opinion mining data were taken into account. Based on these results, we answer two main questions: ‘Why is collaborative filtering algorithm not effective?’ and ‘Do quantitative data such as product rating or purchase history reflect users’ actual tastes?’


2021 ◽  
Author(s):  
Vimal Rathakrishnan ◽  
Salmia Beddu ◽  
Ali Najah Ahmed

Abstract In this research, a comparison study of the machine learning (ML) optimisation technique to predict the compressive strength of concrete is discussed. In previous studies, researchers focused on identifying the machine learning model by comparing, ensemble, bagging, and fusion methods in predicting the concrete strength. In this research, an ML model hyper-parameter optimisation is used to improve the prediction accuracy and performance of the model. Extreme gradient boosting (XGBoost) is used as the base model to perform the prediction, as the XGBoost has a built-in model ensemble, bagging, and boosting algorithms. Grid Search, Random Search, and Bayesian Optimisation are selected and used to optimise the hyperparameters of the XGBoost model. For this particular prediction study, the optimised models based on Random Search performed better than other optimisation methods. The Random Search optimisation method showed substantial improvements in prediction accuracy, modelling error and computation time.


Materials ◽  
2021 ◽  
Vol 14 (15) ◽  
pp. 4222
Author(s):  
Ayaz Ahmad ◽  
Krzysztof Adam Ostrowski ◽  
Mariusz Maślak ◽  
Furqan Farooq ◽  
Imran Mehmood ◽  
...  

High temperature severely affects the nature of the ingredients used to produce concrete, which in turn reduces the strength properties of the concrete. It is a difficult and time-consuming task to achieve the desired compressive strength of concrete. However, the application of supervised machine learning (ML) approaches makes it possible to initially predict the targeted result with high accuracy. This study presents the use of a decision tree (DT), an artificial neural network (ANN), bagging, and gradient boosting (GB) to forecast the compressive strength of concrete at high temperatures on the basis of 207 data points. Python coding in Anaconda navigator software was used to run the selected models. The software requires information regarding both the input variables and the output parameter. A total of nine input parameters (water, cement, coarse aggregate, fine aggregate, fly ash, superplasticizers, silica fume, nano silica, and temperature) were incorporated as the input, while one variable (compressive strength) was selected as the output. The performance of the employed ML algorithms was evaluated with regards to statistical indicators, including the coefficient correlation (R2), mean absolute error (MAE), mean square error (MSE), and root mean square error (RMSE). Individual models using DT and ANN gave R2 equal to 0.83 and 0.82, respectively, while the use of the ensemble algorithm and gradient boosting gave R2 of 0.90 and 0.88, respectively. This indicates a strong correlation between the actual and predicted outcomes. The k-fold cross-validation, coefficient correlation (R2), and lesser errors (MAE, MSE, and RMSE) showed better performance than the ensemble algorithms. Sensitivity analyses were also conducted in order to check the contribution of each input variable. It has been shown that the use of the ensemble machine learning algorithm would enhance the performance level of the model. 


2019 ◽  
Author(s):  
Kasper Van Mens ◽  
Joran Lokkerbol ◽  
Richard Janssen ◽  
Robert de Lange ◽  
Bea Tiemens

BACKGROUND It remains a challenge to predict which treatment will work for which patient in mental healthcare. OBJECTIVE In this study we compare machine algorithms to predict during treatment which patients will not benefit from brief mental health treatment and present trade-offs that must be considered before an algorithm can be used in clinical practice. METHODS Using an anonymized dataset containing routine outcome monitoring data from a mental healthcare organization in the Netherlands (n = 2,655), we applied three machine learning algorithms to predict treatment outcome. The algorithms were internally validated with cross-validation on a training sample (n = 1,860) and externally validated on an unseen test sample (n = 795). RESULTS The performance of the three algorithms did not significantly differ on the test set. With a default classification cut-off at 0.5 predicted probability, the extreme gradient boosting algorithm showed the highest positive predictive value (ppv) of 0.71(0.61 – 0.77) with a sensitivity of 0.35 (0.29 – 0.41) and area under the curve of 0.78. A trade-off can be made between ppv and sensitivity by choosing different cut-off probabilities. With a cut-off at 0.63, the ppv increased to 0.87 and the sensitivity dropped to 0.17. With a cut-off of at 0.38, the ppv decreased to 0.61 and the sensitivity increased to 0.57. CONCLUSIONS Machine learning can be used to predict treatment outcomes based on routine monitoring data.This allows practitioners to choose their own trade-off between being selective and more certain versus inclusive and less certain.


Sign in / Sign up

Export Citation Format

Share Document