A distributed group recommendation system based on extreme gradient boosting and big data technologies

Organisations that perform business operations in a multi-sourced big data environment are in imperative need to discover meaningful patterns of interest from their diversified data sources. With the advent of big data technologies such as Hadoop and Spark, commodity hardwares play vital role in the task of data analytics and process the multi-sourced and multi-formatted big data in a reasonable cost and time. Though various data analytic techniques exist in the context of big data, recommendation system is more popular in web-based business applications to suggest suitable products, services, and items to potential customers. In this paper, we put forth a big data recommendation engine framework based on local pattern analytics strategy to explore user preferences and taste for both branch level and central level decisions. The framework encourages the practice of moving computing environment towards the data source location and avoids forceful integration of data. Further it assists decision makers to reap hidden preferences and taste of users from branch data sources for an effective customer campaign. The novelty of the framework has been evaluated in the benchmark dataset, MovieLens100k and results clearly confirm the advantages of the proposal.

Download Full-text

Toward Improving the Prediction Accuracy of Product Recommendation System Using Extreme Gradient Boosting and Encoding Approaches

Symmetry ◽

10.3390/sym12091566 ◽

2020 ◽

Vol 12 (9) ◽

pp. 1566 ◽

Cited By ~ 2

Author(s):

Zeinab Shahbazi ◽

Debapriya Hazra ◽

Sejoon Park ◽

Yung Cheol Byun

Keyword(s):

Machine Learning ◽

South Korea ◽

Collaborative Filtering ◽

Mean Square Error ◽

Prediction Accuracy ◽

Recommendation System ◽

Recommendation Systems ◽

Gradient Boosting ◽

Mean Square ◽

Extreme Gradient Boosting

With the spread of COVID-19, the “untact” culture in South Korea is expanding and customers are increasingly seeking for online services. A recommendation system serves as a decision-making indicator that helps users by suggesting items to be purchased in the future by exploring the symmetry between multiple user activity characteristics. A plethora of approaches are employed by the scientific community to design recommendation systems, including collaborative filtering, stereotyping, and content-based filtering, etc. The current paradigm of recommendation systems favors collaborative filtering due to its significant potential to closely capture the interest of a user as compared to other approaches. The collaborative filtering harnesses features like user-profile details, visited pages, and click information to determine the interest of a user, thereby recommending the items that are related to the user’s interest. The existing collaborative filtering approaches exploit implicit and explicit features and report either good classification or prediction outcome. These systems fail to exhibit good results for both measures at the same time. We believe that avoiding the recommendation of those items that have already been purchased could contribute to overcoming the said issue. In this study, we present a collaborative filtering-based algorithm to tackle big data of user with symmetric purchasing order and repetitive purchased products. The proposed algorithm relies on combining extreme gradient boosting machine learning architecture with word2vec mechanism to explore the purchased products based on the click patterns of users. Our algorithm improves the accuracy of predicting the relevant products to be recommended to the customers that are likely to be bought. The results are evaluated on the dataset that contains click-based features of users from an online shopping mall in Jeju Island, South Korea. We have evaluated Mean Absolute Error, Mean Square Error, and Root Mean Square Error for our proposed methodology and also other machine learning algorithms. Our proposed model generated the least error rate and enhanced the prediction accuracy of the recommendation system compared to other traditional approaches.

Download Full-text

Machine learning methods for predicting postpartum depression: A scoping review (Preprint)

10.2196/preprints.29765 ◽

2021 ◽

Author(s):

Kiran Saqib ◽

Amber Fozia Khan ◽

Zahid Ahmad Butt

Keyword(s):

Machine Learning ◽

Big Data ◽

Postpartum Depression ◽

Scoping Review ◽

Early Stage ◽

Maternal Mental Health ◽

Gradient Boosting ◽

Support Vector ◽

Study Results ◽

Extreme Gradient Boosting

BACKGROUND Machine learning (ML) offers vigorous statistical and probabilistic techniques that can successfully predict certain clinical conditions using large volumes of data. A review of ML and big data research analytics in maternal depression is pertinent and timely given the rapid technological developments in recent years. OBJECTIVE This paper aims to synthesize the literature on machine learning and big data analytics for maternal mental health, particularly the prediction of postpartum depression (PPD). METHODS A scoping review methodology using the Arksey and O’Malley framework was employed to rapidly map the research activity in the field of ML for predicting PPD. A literature search was conducted through health and IT research databases, including PsycInfo, PubMed, IEEE Xplore and the ACM Digital Library from Sep 2020 till Jan 2021. Data were extracted on the article’s ML model, data type, and study results. RESULTS A total of fourteen (14) studies were identified. All studies reported the use of supervised learning techniques to predict PPD. Support vector machine (SVM) and random forests (RF) were the most commonly employed algorithms in addition to naïve Bayes, regression, artificial neural network, decision trees and extreme gradient boosting. There was considerable heterogeneity in the best performing ML algorithm across selected studies. The area under the receiver-operating-characteristic curve (AUC) values reported for different algorithms were SVM (Range: 0.78-0.86); RF method (0.88); extreme gradient boosting (0.80); logistic regression (0.93); and extreme gradient boosting (0.71) respectively. CONCLUSIONS ML algorithms are capable of analyzing larger datasets and performing more advanced computations, that can significantly improve the detection of PPD at an early stage. Further clinical-research collaborations are required to fine-tune ML algorithms for prediction and treatments. ML might become part of evidence-based practice, in addition to clinical knowledge and existing research evidence.

Download Full-text

Machine learning methods for predicting postpartum depression: A scoping review (Preprint)

10.2196/preprints.29838 ◽

2021 ◽

Author(s):

Kiran Saqib ◽

Amber Fozia Khan ◽

Zahid Ahmad Butt

Keyword(s):

Machine Learning ◽

Big Data ◽

Postpartum Depression ◽

Scoping Review ◽

Early Stage ◽

Maternal Mental Health ◽

Gradient Boosting ◽

Support Vector ◽

Study Results ◽

Extreme Gradient Boosting

BACKGROUND Machine learning (ML) offers vigorous statistical and probabilistic techniques that can successfully predict certain clinical conditions using large volumes of data. A review of ML and big data research analytics in maternal depression is pertinent and timely given the rapid technological developments in recent years. OBJECTIVE This paper aims to synthesize the literature on machine learning and big data analytics for maternal mental health, particularly the prediction of postpartum depression (PPD). METHODS A scoping review methodology using the Arksey and O’Malley framework was employed to rapidly map the research activity in the field of ML for predicting PPD. Two independent researchers searched PsycInfo, PubMed, IEEE Xplore and the ACM Digital Library in September 2020 to identify relevant publications in the past 12 years. Data were extracted on the article’s ML model, data type, and study results. RESULTS A total of fourteen (14) studies were identified. All studies reported the use of supervised learning techniques to predict PPD. Support vector machine (SVM) and random forests (RF) were the most commonly employed algorithms in addition to naïve Bayes, regression, artificial neural network, decision trees and extreme gradient boosting. There was considerable heterogeneity in the best performing ML algorithm across selected studies. The area under the receiver-operating-characteristic curve (AUC) values reported for different algorithms were SVM (Range: 0.78-0.86); RF method (0.88); extreme gradient boosting (0.80); logistic regression (0.93); and extreme gradient boosting (0.71) respectively. CONCLUSIONS ML algorithms are capable of analyzing larger datasets and performing more advanced computations, that can significantly improve the detection of PPD at an early stage. Further clinical-research collaborations are required to fine-tune ML algorithms for prediction and treatments. ML might become part of evidence-based practice, in addition to clinical knowledge and existing research evidence.

Download Full-text

A Recommendation System Based on Extreme Gradient Boosting Classifier

2018 10th International Conference on Modelling, Identification and Control (ICMIC) ◽

10.1109/icmic.2018.8529885 ◽

2018 ◽

Cited By ~ 2

Author(s):

Longteng Xu ◽

Jiwei Liu ◽

Yu Gu

Keyword(s):

Recommendation System ◽

Gradient Boosting ◽

Extreme Gradient Boosting

Download Full-text

Extreme Gradient Boosting for Recommendation System by Transforming Product Classification into Regression Based on Multi-Dimensional Word2Vec

Symmetry ◽

10.3390/sym13050758 ◽

2021 ◽

Vol 13 (5) ◽

pp. 758

Author(s):

Se-Joon Park ◽

Chul-Ung Kang ◽

Yung-Cheol Byun

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Recommendation System ◽

Classification Model ◽

Gradient Boosting ◽

Multiple Products ◽

Multiple Dimensions ◽

Extreme Gradient Boosting ◽

Recommendation Accuracy

Now that untact services are widespread and worldwide, the number of users visiting online shopping malls has increased. For example, the recommendation systems in Netflix, Amazon, etc., have gained a lot of attention by attracting many users and have made large profit by recommending suitable products to their users. In the paper, we conduct a study to enhance recommendation accuracy using Word2Vec, widely used in natural language processing. We collect user shopping history with personal click preference information of product items as data, representing a document for natural language processing. The sequence of product item clicks is fed into the Word2Vec technology algorithm to obtain the vectors symmetrically representing all of the product items clicked by users. Training and test data have a series of vectors representing a sequence of the clicked product items as inputs and a purchased product as a target. Machine learning models recommend a product as a symmetric vector for each input and calculate the similarity among the recommended vectors and all other registered products they sell in the system to recommend multiple products as final recommendation results. We use XGBoost regressor and classifier models to recommend some products that users would like and evaluate the recommendation accuracy. A finally recommended product by the models is a vector, and the system recommends some more products by calculating the similarity as mentioned above. We evaluated the classifier model’s recommendation accuracy without Word2Vec encoding first and then with the Word2Vec technique. Meanwhile, we can represent the products with single or multiple dimensional vectors. We noted that the recommendation accuracy increases when we use multiple dimensions of Word2Vec vectors from the experiments. We also evaluated the performances when the system recommends one or multiple products. For the recommendation of multiple products (five here), a regression model has higher accuracy than a classification model in all dimensions of vectors.

Download Full-text

Research on Credit Risk Identification of Internet Financial Enterprises Based on Big Data

Mobile Information Systems ◽

10.1155/2021/1034803 ◽

2021 ◽

Vol 2021 ◽

pp. 1-8

Author(s):

Hua Peng

Keyword(s):

Big Data ◽

Credit Risk ◽

Learning Algorithm ◽

Synthesis Method ◽

Risk Identification ◽

Gradient Boosting ◽

Analytic Hierarchy ◽

Region Division ◽

Extreme Gradient Boosting ◽

Financial Credit

The advent of the era of big data has provided a new way of development for Internet financial credit collection. The traditional methods of credit risk identification of Internet financial enterprises cannot get the characteristics of credit risk zoning, leading to large errors in the results of credit risk identification. Therefore, this paper proposes a new method of credit risk identification based on big data for Internet financial enterprises. According to the big data perspective, the credit risk assessment steps of Internet financial enterprises are analyzed and the weight of assessment indicators is calculated using the improved analytic hierarchy process (AHP), and the linear weighted synthesis method is applied to comprehensively assess the credit of clients. Using the unique characteristics of big data credit risk region division, the big data credit risk is determined by rule-based matching method. The eXtreme Gradient Boosting (XGBoost) machine learning algorithm is used to establish a credit risk identification model of Internet financial enterprises. The kappa coefficient and ROC curve are used to evaluate the performance of the proposed method. Experimental results show that the proposed method can accurately assess the credit risk of Internet financial enterprises.

Download Full-text

A Novel Purchase Target Prediction System using Extreme Gradient Boosting Machines

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.j9331.0881019 ◽

2019 ◽

Vol 8 (10) ◽

pp. 2070-2072 ◽

Cited By ~ 1

Keyword(s):

Big Data ◽

Purchase Intention ◽

Target Prediction ◽

Gradient Boosting ◽

Distributed Computing Systems ◽

Computing Systems ◽

Web Based ◽

Suggested Technique ◽

Decision Support Framework ◽

Extreme Gradient Boosting

In recent days, electronic business (E-trade) gives more changeto buyers as well as opens doors in web based promoting and advertising. Online promoters can see increasingly about buyer inclinations, dependent on their day by day web-based shopping and surfing. The advancement of big data and distributed computing systems further engage promoters and advertisers to have an information driven and purchaser explicit inclination proposal dependent on the web-basedsurfing narratives. In this article, a decision supportive network is proposed to anticipate a customer buy intentionin the middle of surfing. The proposed decision support framework classifies surfing sessions into sales based and common methods utilizing extreme boosting machines. The proposed technique further demonstrates its solid forecasting ability contrasted with other benchmark calculations which includes logistic retrogression and conventional ensemble brands. The suggested technique can be executed in actual time offering calculations for web-based publicizing methodologies. Promotion on surfing session with potential buying expectation enhance the successfulof ads. Keywords - purchase intention forecast, big data, decision trees machine learning, extreme gradient boosting machines.

Download Full-text