Gradient Boosted Decision Tree Algorithms for Medicare Fraud Detection

2021 ◽  
Vol 2 (4) ◽  
Author(s):  
John T. Hancock ◽  
Taghi M. Khoshgoftaar
2021 ◽  
Author(s):  
Thomas Weripuo Gyeera

<div>The National Institute of Standards and Technology defines the fundamental characteristics of cloud computing as: on-demand computing, offered via the network, using pooled resources, with rapid elastic scaling and metered charging. The rapid dynamic allocation and release of resources on demand to meet heterogeneous computing needs is particularly challenging for data centres, which process a huge amount of data characterised by its high volume, velocity, variety and veracity (4Vs model). Data centres seek to regulate this by monitoring and adaptation, typically reacting to service failures after the fact. We present a real cloud test bed with the capabilities of proactively monitoring and gathering cloud resource information for making predictions and forecasts. This contrasts with the state-of-the-art reactive monitoring of cloud data centres. We argue that the behavioural patterns and Key Performance Indicators (KPIs) characterizing virtualized servers, networks, and database applications can best be studied and analysed with predictive models. Specifically, we applied the Boosted Decision Tree machine learning algorithm in making future predictions on the KPIs of a cloud server and virtual infrastructure network, yielding an R-Square of 0.9991 at a 0.2 learning rate. This predictive framework is beneficial for making short- and long-term predictions for cloud resources.</div>


2021 ◽  
Author(s):  
İsmail Can Dikmen ◽  
Teoman Karadağ

Abstract Today, the storage of electrical energy is one of the most important technical challenges. The increasing number of high capacity, high-power applications, especially electric vehicles and grid energy storage, points to the fact that we will be faced with a large amount of batteries that will need to be recycled and separated in the near future. An alternative method to the currently used methods for separating these batteries according to their chemistry is discussed in this study. This method can be applied even on integrated circuits due to its ease of implementation and low operational cost. In this respect, it is also possible to use it in multi-chemistry battery management systems to detect the chemistry of the connected battery. For the implementation of the method, the batteries are connected to two different loads alternately. In this way, current and voltage values ​​are measured for two different loads without allowing the battery to relax. The obtained data is pre-processed with a separation function developed based on statistical significance. In machine learning algorithms, artificial neural network and decision tree algorithms are trained with processed data and used to determine battery chemistry with 100% accuracy. The efficiency and ease of implementation of the decision tree algorithm in such a categorization method are presented comparatively.


Author(s):  
Chao Sun ◽  
David Stirling

Decision tree algorithms were not traditionally considered for sequential data classification, mostly because feature generation needs to be integrated with the modelling procedure in order to avoid a localisation problem. This paper presents an Event Group Based Classification (EGBC) framework that utilises an X-of-N (XoN) decision tree algorithm to avoid the feature generation issue during the classification on sequential data. In this method, features are generated independently based on the characteristics of the sequential data. Subsequently an XoN decision tree is utilised to select and aggregate useful features from various temporal and other dimensions (as event groups) for optimised classification. This leads the EGBC framework to be adaptive to sequential data of differing dimensions, robust to missing data and accommodating to either numeric or nominal data types. The comparatively improved outcomes from applying this method are demonstrated on two distinct areas – a text based language identification task, as well as a honeybee dance behaviour classification problem. A further motivating industrial problem – hot metal temperature prediction, is further considered with the EGBC framework in order to address significant real-world demands.


2018 ◽  
Vol 20 (3) ◽  
pp. 298-105 ◽  
Author(s):  
Shrawan Kumar Trivedi ◽  
Prabin Kumar Panigrahi

PurposeEmail spam classification is now becoming a challenging area in the domain of text classification. Precise and robust classifiers are not only judged by classification accuracy but also by sensitivity (correctly classified legitimate emails) and specificity (correctly classified unsolicited emails) towards the accurate classification, captured by both false positive and false negative rates. This paper aims to present a comparative study between various decision tree classifiers (such as AD tree, decision stump and REP tree) with/without different boosting algorithms (bagging, boosting with re-sample and AdaBoost).Design/methodology/approachArtificial intelligence and text mining approaches have been incorporated in this study. Each decision tree classifier in this study is tested on informative words/features selected from the two publically available data sets (SpamAssassin and LingSpam) using a greedy step-wise feature search method.FindingsOutcomes of this study show that without boosting, the REP tree provides high performance accuracy with the AD tree ranking as the second-best performer. Decision stump is found to be the under-performing classifier of this study. However, with boosting, the combination of REP tree and AdaBoost compares favourably with other classification models. If the metrics false positive rate and performance accuracy are taken together, AD tree and REP tree with AdaBoost were both found to carry out an effective classification task. Greedy stepwise has proven its worth in this study by selecting a subset of valuable features to identify the correct class of emails.Research limitations/implicationsThis research is focussed on the classification of those email spams that are written in the English language only. The proposed models work with content (words/features) of email data that is mostly found in the body of the mail. Image spam has not been included in this study. Other messages such as short message service or multi-media messaging service were not included in this study.Practical implicationsIn this research, a boosted decision tree approach has been proposed and used to classify email spam and ham files; this is found to be a highly effective approach in comparison with other state-of-the-art modes used in other studies. This classifier may be tested for different applications and may provide new insights for developers and researchers.Originality/valueA comparison of decision tree classifiers with/without ensemble has been presented for spam classification.


Author(s):  
Tanujit Chakraborty

Decision tree algorithms have been among the most popular algorithms for interpretable (transparent) machine learning since the early 1980s. On the other hand, deep learning methods have boosted the capacity of machine learning algorithms and are now being used for non-trivial applications in various applied domains. But training a fully-connected deep feed-forward network by gradient-descent backpropagation is slow and requires arbitrary choices regarding the number of hidden units and layers. In this paper, we propose near-optimal neural regression trees, intending to make it much faster than deep feed-forward networks and for which it is not essential to specify the number of hidden units in the hidden layers of the neural network in advance. The key idea is to construct a decision tree and then simulate the decision tree with a neural network. This work aims to build a mathematical formulation of neural trees and gain the complementary benefits of both sparse optimal decision trees and neural trees. We propose near-optimal sparse neural trees (NSNT) that is shown to be asymptotically consistent and robust in nature. Additionally, the proposed NSNT model obtain a fast rate of convergence which is near-optimal up to some logarithmic factor. We comprehensively benchmark the proposed method on a sample of 80 datasets (40 classification datasets and 40 regression datasets) from the UCI machine learning repository. We establish that the proposed method is likely to outperform the current state-of-the-art methods (random forest, XGBoost, optimal classification tree, and near-optimal nonlinear trees) for the majority of the datasets.


Sign in / Sign up

Export Citation Format

Share Document