Efficient Gradient Boosted Decision Tree Training on GPUs

<div>The National Institute of Standards and Technology defines the fundamental characteristics of cloud computing as: on-demand computing, offered via the network, using pooled resources, with rapid elastic scaling and metered charging. The rapid dynamic allocation and release of resources on demand to meet heterogeneous computing needs is particularly challenging for data centres, which process a huge amount of data characterised by its high volume, velocity, variety and veracity (4Vs model). Data centres seek to regulate this by monitoring and adaptation, typically reacting to service failures after the fact. We present a real cloud test bed with the capabilities of proactively monitoring and gathering cloud resource information for making predictions and forecasts. This contrasts with the state-of-the-art reactive monitoring of cloud data centres. We argue that the behavioural patterns and Key Performance Indicators (KPIs) characterizing virtualized servers, networks, and database applications can best be studied and analysed with predictive models. Specifically, we applied the Boosted Decision Tree machine learning algorithm in making future predictions on the KPIs of a cloud server and virtual infrastructure network, yielding an R-Square of 0.9991 at a 0.2 learning rate. This predictive framework is beneficial for making short- and long-term predictions for cloud resources.</div>

Spam classification: a comparative analysis of different boosted decision tree approaches

Journal of Systems and Information Technology ◽

10.1108/jsit-11-2017-0105 ◽

2018 ◽

Vol 20 (3) ◽

pp. 298-105 ◽

Cited By ~ 4

Author(s):

Shrawan Kumar Trivedi ◽

Prabin Kumar Panigrahi

Keyword(s):

Decision Tree ◽

False Positive ◽

False Positive Rate ◽

False Negative ◽

The Body ◽

Content Type ◽

Performance Accuracy ◽

Tree Classifier ◽

Boosted Decision Tree ◽

Email Spam

PurposeEmail spam classification is now becoming a challenging area in the domain of text classification. Precise and robust classifiers are not only judged by classification accuracy but also by sensitivity (correctly classified legitimate emails) and specificity (correctly classified unsolicited emails) towards the accurate classification, captured by both false positive and false negative rates. This paper aims to present a comparative study between various decision tree classifiers (such as AD tree, decision stump and REP tree) with/without different boosting algorithms (bagging, boosting with re-sample and AdaBoost).Design/methodology/approachArtificial intelligence and text mining approaches have been incorporated in this study. Each decision tree classifier in this study is tested on informative words/features selected from the two publically available data sets (SpamAssassin and LingSpam) using a greedy step-wise feature search method.FindingsOutcomes of this study show that without boosting, the REP tree provides high performance accuracy with the AD tree ranking as the second-best performer. Decision stump is found to be the under-performing classifier of this study. However, with boosting, the combination of REP tree and AdaBoost compares favourably with other classification models. If the metrics false positive rate and performance accuracy are taken together, AD tree and REP tree with AdaBoost were both found to carry out an effective classification task. Greedy stepwise has proven its worth in this study by selecting a subset of valuable features to identify the correct class of emails.Research limitations/implicationsThis research is focussed on the classification of those email spams that are written in the English language only. The proposed models work with content (words/features) of email data that is mostly found in the body of the mail. Image spam has not been included in this study. Other messages such as short message service or multi-media messaging service were not included in this study.Practical implicationsIn this research, a boosted decision tree approach has been proposed and used to classify email spam and ham files; this is found to be a highly effective approach in comparison with other state-of-the-art modes used in other studies. This classifier may be tested for different applications and may provide new insights for developers and researchers.Originality/valueA comparison of decision tree classifiers with/without ensemble has been presented for spam classification.

Usage of stratified sampling of control subset for predicativity improvement of boosted decision tree models

Modeling and Information Systems in Economics ◽

10.33111/mise.99.10 ◽

2020 ◽

pp. 119-131

Author(s):

Viacheslav Pyrohov

Keyword(s):

Decision Tree ◽

Stratified Sampling ◽

Boosted Decision Tree ◽

Tree Models

A gradient boosted decision tree-based sentiment classification of twitter data

International Journal of Wavelets Multiresolution and Information Processing ◽

10.1142/s0219691320500277 ◽

2020 ◽

Vol 18 (04) ◽

pp. 2050027

Author(s):

S. Neelakandan ◽

D. Paulraj

Keyword(s):

Decision Tree ◽

Opinion Mining ◽

Research Topic ◽

Sentiment Classification ◽

Decision Tree Classifier ◽

Twitter Data ◽

Tree Classifier ◽

Boosted Decision Tree ◽

Text Sentiment Analysis

People communicate their views, arguments and emotions about their everyday life on social media (SM) platforms (e.g. Twitter and Facebook). Twitter stands as an international micro-blogging service that features a brief message called tweets. Freestyle writing, incorrect grammar, typographical errors and abbreviations are some noises that occur in the text. Sentiment analysis (SA) centered on a tweet posted by the user, and also opinion mining (OM) of the customers review is another famous research topic. The texts are gathered from users’ tweets by means of OM and automatic-SA centered on ternary classifications, namely positive, neutral and negative. It is very challenging for the researchers to ascertain sentiments as a result of its limited size, misspells, unstructured nature, abbreviations and slangs for Twitter data. This paper, with the aid of the Gradient Boosted Decision Tree classifier (GBDT), proposes an efficient SA and Sentiment Classification (SC) of Twitter data. Initially, the twitter data undergoes pre-processing. Next, the pre-processed data is processed using HDFS MapReduce. Now, the features are extracted from the processed data, and then efficient features are selected using the Improved Elephant Herd Optimization (I-EHO) technique. Now, score values are calculated for each of those chosen features and given to the classifier. At last, the GBDT classifier classifies the data as negative, positive, or neutral. Experiential results are analyzed and contrasted with the other conventional techniques to show the highest performance of the proposed method.

Mobile Money Fraud Prediction—A Cross-Case Analysis on the Efficiency of Support Vector Machines, Gradient Boosted Decision Trees, and Naïve Bayes Algorithms

Information ◽

10.3390/info11080383 ◽

2020 ◽

Vol 11 (8) ◽

pp. 383

Author(s):

Francis Effirim Botchey ◽

Zhen Qin ◽

Kwesi Hughes-Lartey

Keyword(s):

Developing Countries ◽

Support Vector Machines ◽

Decision Tree ◽

Naive Bayes ◽

Naïve Bayes ◽

Machine Learning Algorithms ◽

Support Vector ◽

Mobile Money ◽

Vector Machines ◽

Boosted Decision Tree

The onset of COVID-19 has re-emphasized the importance of FinTech especially in developing countries as the major powers of the world are already enjoying the advantages that come with the adoption of FinTech. Handling of physical cash has been established as a means of transmitting the novel corona virus. Again, research has established that, been unbanked raises the potential of sinking one into abject poverty. Over the years, developing countries have been piloting the various forms of FinTech, but the very one that has come to stay is the Mobile Money Transactions (MMT). As mobile money transactions attempt to gain a foothold, it faces several problems, the most important of them is mobile money fraud. This paper seeks to provide a solution to this problem by looking at machine learning algorithms based on support vector machines (kernel-based), gradient boosted decision tree (tree-based) and Naïve Bayes (probabilistic based) algorithms, taking into consideration the imbalanced nature of the dataset. Our experiments showed that the use of gradient boosted decision tree holds a great potential in combating the problem of mobile money fraud as it was able to produce near perfect results.

Azure Machine Learning tools efficiency in the electroencephalographic signal P300 standard and target responses classification

Bio-Algorithms and Med-Systems ◽

10.1515/bams-2019-0031 ◽

2019 ◽

Vol 15 (3) ◽

Author(s):

Grzegorz M. Wójcik ◽

Andrzej Kawiak ◽

Lukasz Kwasniewicz ◽

Piotr Schneider ◽

Jolanta Masiak

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Decision Tree ◽

Event Related Potentials ◽

Learning Tools ◽

Mentally Disabled ◽

Boosted Decision Tree ◽

Brodmann Areas ◽

Related Potentials ◽

Somatosensory Responses

AbstractThe Event-Related Potentials were investigated on a group of 70 participants using the dense array electroencephalographic amplifier with photogrammetry geodesic station. The source localisation was computed for each participant. The activity of brodmann areas (BAs) involved in the brain cortical activity of each participant was measured. Then the mean electric charge flowing through particular areas was calculated. The five different machine learning tools (logistic regression, boosted decision tree, Bayes point machine, classic neural network and averaged perceptron classifier) from the Azure ecosystem were trained, and their accuracy was tested in the task of distinguishing standard and target responses in the experiment. The efficiency of each tool was compared, and it was found out that the best tool was logistic regression and the boosted decision tree in our task. Such an approach can be useful in eliminating somatosensory responses in experimental psychology or even in establishing new communication protocols with mildly mentally disabled subjects.

Improving segmentation accuracy for magnetic resonance imaging using a boosted decision tree

Journal of Neuroscience Methods ◽

10.1016/j.jneumeth.2008.08.017 ◽

2008 ◽

Vol 175 (2) ◽

pp. 206-217 ◽

Cited By ~ 6

Author(s):

Wen-Hung Chao ◽

You-Yin Chen ◽

Chien-Wen Cho ◽

Sheng-Huang Lin ◽

Yen-Yu I. Shih ◽

...

Keyword(s):

Magnetic Resonance Imaging ◽

Magnetic Resonance ◽

Decision Tree ◽

Resonance Imaging ◽

Segmentation Accuracy ◽

Boosted Decision Tree

An Efficient Detection of HCC-recurrence in Clinical Data Processing using Boosted Decision Tree Classifier

Procedia Computer Science ◽

10.1016/j.procs.2020.03.196 ◽

2020 ◽

Vol 167 ◽

pp. 193-204

Author(s):

P. Radha ◽

R. Divya

Keyword(s):

Decision Tree ◽

Data Processing ◽

Clinical Data ◽

Decision Tree Classifier ◽

Efficient Detection ◽

Tree Classifier ◽

Boosted Decision Tree ◽

Hcc Recurrence

Speaker recognition using adaptively boosted decision tree classifier

IEEE International Conference on Acoustics Speech and Signal Processing ◽

10.1109/icassp.2002.5743678 ◽

2002 ◽

Cited By ~ 3

Author(s):

Say Wei Foo ◽

Eng Guan Lim

Keyword(s):

Decision Tree ◽

Speaker Recognition ◽

Decision Tree Classifier ◽

Tree Classifier ◽

Boosted Decision Tree