Gradient Boosted Decision Tree Algorithms for Medicare Fraud Detection

<div>The National Institute of Standards and Technology defines the fundamental characteristics of cloud computing as: on-demand computing, offered via the network, using pooled resources, with rapid elastic scaling and metered charging. The rapid dynamic allocation and release of resources on demand to meet heterogeneous computing needs is particularly challenging for data centres, which process a huge amount of data characterised by its high volume, velocity, variety and veracity (4Vs model). Data centres seek to regulate this by monitoring and adaptation, typically reacting to service failures after the fact. We present a real cloud test bed with the capabilities of proactively monitoring and gathering cloud resource information for making predictions and forecasts. This contrasts with the state-of-the-art reactive monitoring of cloud data centres. We argue that the behavioural patterns and Key Performance Indicators (KPIs) characterizing virtualized servers, networks, and database applications can best be studied and analysed with predictive models. Specifically, we applied the Boosted Decision Tree machine learning algorithm in making future predictions on the KPIs of a cloud server and virtual infrastructure network, yielding an R-Square of 0.9991 at a 0.2 learning rate. This predictive framework is beneficial for making short- and long-term predictions for cloud resources.</div>

Download Full-text

Novel Approach for Battery Type Determination: A Mere Electrical Alternative

10.21203/rs.3.rs-858317/v1 ◽

2021 ◽

Author(s):

İsmail Can Dikmen ◽

Teoman Karadağ

Keyword(s):

Integrated Circuits ◽

Decision Tree ◽

Statistical Significance ◽

High Capacity ◽

Electrical Energy ◽

Machine Learning Algorithms ◽

Battery Management ◽

Separation Function ◽

Novel Approach ◽

Tree Algorithms

Abstract Today, the storage of electrical energy is one of the most important technical challenges. The increasing number of high capacity, high-power applications, especially electric vehicles and grid energy storage, points to the fact that we will be faced with a large amount of batteries that will need to be recycled and separated in the near future. An alternative method to the currently used methods for separating these batteries according to their chemistry is discussed in this study. This method can be applied even on integrated circuits due to its ease of implementation and low operational cost. In this respect, it is also possible to use it in multi-chemistry battery management systems to detect the chemistry of the connected battery. For the implementation of the method, the batteries are connected to two different loads alternately. In this way, current and voltage values are measured for two different loads without allowing the battery to relax. The obtained data is pre-processed with a separation function developed based on statistical significance. In machine learning algorithms, artificial neural network and decision tree algorithms are trained with processed data and used to determine battery chemistry with 100% accuracy. The efficiency and ease of implementation of the decision tree algorithm in such a categorization method are presented comparatively.

Download Full-text

WT Model & Applications in Loan Platform Customer Default Prediction Based on Decision Tree Algorithms

Intelligent Computing Theories and Application - Lecture Notes in Computer Science ◽

10.1007/978-3-319-95930-6_33 ◽

2018 ◽

pp. 359-371

Author(s):

Sulin Pang ◽

Jinmeng Yuan

Keyword(s):

Decision Tree ◽

Default Prediction ◽

Tree Algorithms

Download Full-text

An Event Group Based Classification Framework for Multi-variate Sequential Data

Australasian Journal of Information Systems ◽

10.3127/ajis.v21i0.1551 ◽

2017 ◽

Vol 21 ◽

Author(s):

Chao Sun ◽

David Stirling

Keyword(s):

Decision Tree ◽

Classification Problem ◽

Sequential Data ◽

Feature Generation ◽

Data Types ◽

Nominal Data ◽

Classification Framework ◽

Improved Outcomes ◽

Tree Algorithms ◽

Industrial Problem

Decision tree algorithms were not traditionally considered for sequential data classification, mostly because feature generation needs to be integrated with the modelling procedure in order to avoid a localisation problem. This paper presents an Event Group Based Classification (EGBC) framework that utilises an X-of-N (XoN) decision tree algorithm to avoid the feature generation issue during the classification on sequential data. In this method, features are generated independently based on the characteristics of the sequential data. Subsequently an XoN decision tree is utilised to select and aggregate useful features from various temporal and other dimensions (as event groups) for optimised classification. This leads the EGBC framework to be adaptive to sequential data of differing dimensions, robust to missing data and accommodating to either numeric or nominal data types. The comparatively improved outcomes from applying this method are demonstrated on two distinct areas – a text based language identification task, as well as a honeybee dance behaviour classification problem. A further motivating industrial problem – hot metal temperature prediction, is further considered with the EGBC framework in order to address significant real-world demands.

Download Full-text

Data Sampling Approaches with Severely Imbalanced Big Data for Medicare Fraud Detection

2018 IEEE 30th International Conference on Tools with Artificial Intelligence (ICTAI) ◽

10.1109/ictai.2018.00030 ◽

2018 ◽

Cited By ~ 8

Author(s):

Richard A Bauder ◽

Taghi M Khoshgoftaar ◽

Tawfiq Hasanin

Keyword(s):

Big Data ◽

Fraud Detection ◽

Data Sampling ◽

Medicare Fraud

Download Full-text

Spam classification: a comparative analysis of different boosted decision tree approaches

Journal of Systems and Information Technology ◽

10.1108/jsit-11-2017-0105 ◽

2018 ◽

Vol 20 (3) ◽

pp. 298-105 ◽

Cited By ~ 4

Author(s):

Shrawan Kumar Trivedi ◽

Prabin Kumar Panigrahi

Keyword(s):

Decision Tree ◽

False Positive ◽

False Positive Rate ◽

False Negative ◽

The Body ◽

Content Type ◽

Performance Accuracy ◽

Tree Classifier ◽

Boosted Decision Tree ◽

Email Spam

PurposeEmail spam classification is now becoming a challenging area in the domain of text classification. Precise and robust classifiers are not only judged by classification accuracy but also by sensitivity (correctly classified legitimate emails) and specificity (correctly classified unsolicited emails) towards the accurate classification, captured by both false positive and false negative rates. This paper aims to present a comparative study between various decision tree classifiers (such as AD tree, decision stump and REP tree) with/without different boosting algorithms (bagging, boosting with re-sample and AdaBoost).Design/methodology/approachArtificial intelligence and text mining approaches have been incorporated in this study. Each decision tree classifier in this study is tested on informative words/features selected from the two publically available data sets (SpamAssassin and LingSpam) using a greedy step-wise feature search method.FindingsOutcomes of this study show that without boosting, the REP tree provides high performance accuracy with the AD tree ranking as the second-best performer. Decision stump is found to be the under-performing classifier of this study. However, with boosting, the combination of REP tree and AdaBoost compares favourably with other classification models. If the metrics false positive rate and performance accuracy are taken together, AD tree and REP tree with AdaBoost were both found to carry out an effective classification task. Greedy stepwise has proven its worth in this study by selecting a subset of valuable features to identify the correct class of emails.Research limitations/implicationsThis research is focussed on the classification of those email spams that are written in the English language only. The proposed models work with content (words/features) of email data that is mostly found in the body of the mail. Image spam has not been included in this study. Other messages such as short message service or multi-media messaging service were not included in this study.Practical implicationsIn this research, a boosted decision tree approach has been proposed and used to classify email spam and ham files; this is found to be a highly effective approach in comparison with other state-of-the-art modes used in other studies. This classifier may be tested for different applications and may provide new insights for developers and researchers.Originality/valueA comparison of decision tree classifiers with/without ensemble has been presented for spam classification.

Download Full-text

Usage of stratified sampling of control subset for predicativity improvement of boosted decision tree models

Modeling and Information Systems in Economics ◽

10.33111/mise.99.10 ◽

2020 ◽

pp. 119-131

Author(s):

Viacheslav Pyrohov

Keyword(s):

Decision Tree ◽

Stratified Sampling ◽

Boosted Decision Tree ◽

Tree Models

Download Full-text

Determining Factors Affecting Cooperative Membership of the Beekeepers Using Decision Tree Algorithms

Tarım Bilimleri Dergisi ◽

10.15832/ankutbd.739230 ◽

2022 ◽

pp. 25-32

Author(s):

Tayfun ÇUKUR ◽

Figen ÇUKUR

Keyword(s):

Decision Tree ◽

Factors Affecting ◽

Determining Factors ◽

Tree Algorithms

Download Full-text

Near-Optimal Sparse Neural Trees for Supervised Learning

10.20944/preprints202105.0117.v2 ◽

2021 ◽

Author(s):

Tanujit Chakraborty

Keyword(s):

Neural Network ◽

Machine Learning ◽

Decision Tree ◽

Classification Tree ◽

Mathematical Formulation ◽

Machine Learning Algorithms ◽

Optimal Decision ◽

Feed Forward ◽

Feed Forward Network ◽

Tree Algorithms

Decision tree algorithms have been among the most popular algorithms for interpretable (transparent) machine learning since the early 1980s. On the other hand, deep learning methods have boosted the capacity of machine learning algorithms and are now being used for non-trivial applications in various applied domains. But training a fully-connected deep feed-forward network by gradient-descent backpropagation is slow and requires arbitrary choices regarding the number of hidden units and layers. In this paper, we propose near-optimal neural regression trees, intending to make it much faster than deep feed-forward networks and for which it is not essential to specify the number of hidden units in the hidden layers of the neural network in advance. The key idea is to construct a decision tree and then simulate the decision tree with a neural network. This work aims to build a mathematical formulation of neural trees and gain the complementary benefits of both sparse optimal decision trees and neural trees. We propose near-optimal sparse neural trees (NSNT) that is shown to be asymptotically consistent and robust in nature. Additionally, the proposed NSNT model obtain a fast rate of convergence which is near-optimal up to some logarithmic factor. We comprehensively benchmark the proposed method on a sample of 80 datasets (40 classification datasets and 40 regression datasets) from the UCI machine learning repository. We establish that the proposed method is likely to outperform the current state-of-the-art methods (random forest, XGBoost, optimal classification tree, and near-optimal nonlinear trees) for the majority of the datasets.

Download Full-text