Classification of multiclass imbalanced data using cost-sensitive decision tree C5.0

The multiclass imbalanced data problems in data mining were an interesting to study currently. The problems had an influence on the classification process in machine learning processes. Some cases showed that minority class in the dataset had an important information value compared to the majority class. When minority class was misclassification, it would affect the accuracy value and classifier performance. In this research, cost sensitive decision tree C5.0 was used to solve multiclass imbalanced data problems. The first stage, making the decision tree model uses the C5.0 algorithm then the cost sensitive learning uses the metacost method to obtain the minimum cost model. The results of testing the C5.0 algorithm had better performance than C4.5 and ID3 algorithms. The percentage of algorithm performance from C5.0, C4.5 and ID3 were 40.91%, 40, 24% and 19.23%.

Download Full-text

Adaptation Proposed Methods for Handling Imbalanced Datasets based on Over-Sampling Technique

Al-Mustansiriyah Journal of Science ◽

10.23851/mjs.v31i2.740 ◽

2020 ◽

Vol 31 (2) ◽

pp. 25

Author(s):

Liqaa M. Shoohi ◽

Jamila H. Saud

Keyword(s):

Neural Networks ◽

Decision Tree ◽

Back Propagation ◽

Imbalanced Data ◽

Sampling Technique ◽

Poor Performance ◽

Imbalanced Dataset ◽

Minority Class ◽

Data Result

Classification of imbalanced data is an important issue. Many algorithms have been developed for classification, such as Back Propagation (BP) neural networks, decision tree, Bayesian networks etc., and have been used repeatedly in many fields. These algorithms speak of the problem of imbalanced data, where there are situations that belong to more classes than others. Imbalanced data result in poor performance and bias to a class without other classes. In this paper, we proposed three techniques based on the Over-Sampling (O.S.) technique for processing imbalanced dataset and redistributing it and converting it into balanced dataset. These techniques are (Improved Synthetic Minority Over-Sampling Technique (Improved SMOTE), Borderline-SMOTE + Imbalanced Ratio(IR), Adaptive Synthetic Sampling (ADASYN) +IR) Algorithm, where the work these techniques are generate the synthetic samples for the minority class to achieve balance between minority and majority classes and then calculate the IR between classes of minority and majority. Experimental results show ImprovedSMOTE algorithm outperform the Borderline-SMOTE + IR and ADASYN + IR algorithms because it achieves a high balance between minority and majority classes.

Download Full-text

Embedding Undersampling Rotation Forest for Imbalanced Problem

Computational Intelligence and Neuroscience ◽

10.1155/2018/6798042 ◽

2018 ◽

Vol 2018 ◽

pp. 1-15 ◽

Cited By ~ 3

Author(s):

Huaping Guo ◽

Xiaoyu Diao ◽

Hongbing Liu

Keyword(s):

Imbalanced Data ◽

Feature Space ◽

Original Data ◽

Training Set ◽

Data Set ◽

Minority Class ◽

Rotation Forest ◽

Novel Method ◽

Individual Classifier ◽

The Cost

Rotation Forest is an ensemble learning approach achieving better performance comparing to Bagging and Boosting through building accurate and diverse classifiers using rotated feature space. However, like other conventional classifiers, Rotation Forest does not work well on the imbalanced data which are characterized as having much less examples of one class (minority class) than the other (majority class), and the cost of misclassifying minority class examples is often much more expensive than the contrary cases. This paper proposes a novel method called Embedding Undersampling Rotation Forest (EURF) to handle this problem (1) sampling subsets from the majority class and learning a projection matrix from each subset and (2) obtaining training sets by projecting re-undersampling subsets of the original data set to new spaces defined by the matrices and constructing an individual classifier from each training set. For the first method, undersampling is to force the rotation matrix to better capture the features of the minority class without harming the diversity between individual classifiers. With respect to the second method, the undersampling technique aims to improve the performance of individual classifiers on the minority class. The experimental results show that EURF achieves significantly better performance comparing to other state-of-the-art methods.

Download Full-text

Decision tree model for classification of fake and genuine banknotes using SPSS

World Review of Entrepreneurship Management and Sustainable Development ◽

10.1504/wremsd.2018.097696 ◽

2018 ◽

Vol 14 (6) ◽

pp. 683

Author(s):

Akanksha Upadhyaya ◽

Vinod Shokeen ◽

Garima Srivastava

Keyword(s):

Decision Tree ◽

Decision Tree Model ◽

Tree Model

Download Full-text

Prediction of severe acute pancreatitis using decision tree model based on 2012 Atlanta classification of acute pancreatitis

Journal of the American College of Surgeons ◽

10.1016/j.jamcollsurg.2015.08.170 ◽

2015 ◽

Vol 221 (4) ◽

pp. e102

Author(s):

Liming Dong

Keyword(s):

Acute Pancreatitis ◽

Decision Tree ◽

Severe Acute Pancreatitis ◽

Decision Tree Model ◽

Tree Model ◽

Model Based

Download Full-text

Cost-Effectiveness of Preoperative Nasal Mupirocin Treatment in Preventing Surgical Site Infection in Patients Undergoing Total Hip and Knee Arthroplasty: A Cost-Effectiveness Analysis

Infection Control and Hospital Epidemiology ◽

10.1086/663704 ◽

2012 ◽

Vol 33 (2) ◽

pp. 152-159 ◽

Cited By ~ 72

Author(s):

Xan F. Courville ◽

Ivan M. Tomek ◽

Kathryn B. Kirkland ◽

Marian Birhle ◽

Stephen R. Kantor ◽

...

Keyword(s):

Cost Effectiveness ◽

Decision Tree ◽

Surgical Site Infection ◽

Hypothetical Cohort ◽

Preoperative Treatment ◽

Site Infection ◽

Tree Model ◽

Effectiveness Analysis ◽

Nasal Mupirocin ◽

The Cost

Objective.To perform a cost-effectiveness analysis to evaluate preoperative use of mupirocin in patients with total joint arthroplasty (TJA).Design.Simple decision tree model.Setting.Outpatient TJA clinical setting.Participants.Hypothetical cohort of patients with TJA.Interventions.A simple decision tree model compared 3 strategies in a hypothetical cohort of patients with TJA: (1) obtaining preoperative screening cultures for all patients, followed by administration of mupirocin to patients with cultures positive for Staphylococcus aureus; (2) providing empirical preoperative treatment with mupirocin for all patients without screening; and (3) providing no preoperative treatment or screening. We assessed the costs and benefits over a 1-year period. Data inputs were obtained from a literature review and from our institution's internal data. Utilities were measured in quality-adjusted life-years, and costs were measured in 2005 US dollars.Main Outcome Measure.Incremental cost-effectiveness ratio.Results.The treat-all and screen-and-treat strategies both had lower costs and greater benefits, compared with the no-treatment strategy. Sensitivity analysis revealed that this result is stable even if the cost of mupirocin was over $100 and the cost of SSI ranged between $26,000 and $250,000. Treating all patients remains the best strategy when the prevalence of S. aureus carriers and surgical site infection is varied across plausible values as well as when the prevalence of mupirocin-resistant strains is high.Conclusions.Empirical treatment with mupirocin ointment or use of a screen-and-treat strategy before TJA is performed is a simple, safe, and cost-effective intervention that can reduce the risk of SSI. S. aureus decolonization with nasal mupirocin for patients undergoing TJA should be considered.Level of Evidence.Level II, economic and decision analysis.Infect Control Hosp Epidemiol 2012;33(2):152-159

Download Full-text

SMOTE: Synthetic Minority Over-sampling Technique

Journal of Artificial Intelligence Research ◽

10.1613/jair.953 ◽

2002 ◽

Vol 16 ◽

pp. 321-357 ◽

Cited By ~ 6092

Author(s):

N. V. Chawla ◽

K. W. Bowyer ◽

L. O. Hall ◽

W. P. Kegelmeyer

Keyword(s):

Naive Bayes ◽

Characteristic Curve ◽

Naïve Bayes ◽

Real World Data ◽

Minority Class ◽

Roc Convex Hull ◽

Classifier Performance ◽

Under Sampling ◽

The Cost ◽

Normal Class

An approach to the construction of classifiers from imbalanced datasets is described. A dataset is imbalanced if the classification categories are not approximately equally represented. Often real-world data sets are predominately composed of ``normal'' examples with only a small percentage of ``abnormal'' or ``interesting'' examples. It is also the case that the cost of misclassifying an abnormal (interesting) example as a normal example is often much higher than the cost of the reverse error. Under-sampling of the majority (normal) class has been proposed as a good means of increasing the sensitivity of a classifier to the minority class. This paper shows that a combination of our method of over-sampling the minority (abnormal) class and under-sampling the majority (normal) class can achieve better classifier performance (in ROC space) than only under-sampling the majority class. This paper also shows that a combination of our method of over-sampling the minority class and under-sampling the majority class can achieve better classifier performance (in ROC space) than varying the loss ratios in Ripper or class priors in Naive Bayes. Our method of over-sampling the minority class involves creating synthetic minority class examples. Experiments are performed using C4.5, Ripper and a Naive Bayes classifier. The method is evaluated using the area under the Receiver Operating Characteristic curve (AUC) and the ROC convex hull strategy.

Download Full-text

Classification of algal bloom species from remote sensing data using an extreme gradient boosted decision tree model

International Journal of Remote Sensing ◽

10.1080/01431161.2019.1633696 ◽

2019 ◽

Vol 40 (24) ◽

pp. 9412-9438 ◽

Cited By ~ 9

Author(s):

Jayesh Ganpat Ghatkar ◽

Rakesh Kumar Singh ◽

Palanisamy Shanmugam

Keyword(s):

Remote Sensing ◽

Decision Tree ◽

Algal Bloom ◽

Remote Sensing Data ◽

Decision Tree Model ◽

Tree Model ◽

Sensing Data ◽

Boosted Decision Tree

Download Full-text

Efficient Imbalanced Multimedia Concept Retrieval by Deep Learning on Spark Clusters

Deep Learning and Neural Networks ◽

10.4018/978-1-7998-0414-7.ch017 ◽

2020 ◽

pp. 274-294

Author(s):

Yilin Yan ◽

Min Chen ◽

Saad Sadiq ◽

Mei-Ling Shyu

Keyword(s):

Neural Network ◽

Deep Learning ◽

Convolutional Neural Network ◽

Imbalanced Data ◽

Network Models ◽

Multimedia Data ◽

Neural Network Models ◽

Minority Class ◽

Imbalanced Data Classification

The classification of imbalanced datasets has recently attracted significant attention due to its implications in several real-world use cases. The classifiers developed on datasets with skewed distributions tend to favor the majority classes and are biased against the minority class. Despite extensive research interests, imbalanced data classification remains a challenge in data mining research, especially for multimedia data. Our attempt to overcome this hurdle is to develop a convolutional neural network (CNN) based deep learning solution integrated with a bootstrapping technique. Considering that convolutional neural networks are very computationally expensive coupled with big training datasets, we propose to extract features from pre-trained convolutional neural network models and feed those features to another full connected neutral network. Spark implementation shows promising performance of our model in handling big datasets with respect to feasibility and scalability.

Download Full-text

Decision tree model for classification of fake and genuine banknotes using SPSS

World Review of Entrepreneurship Management and Sustainable Development ◽

10.1504/wremsd.2018.10018826 ◽

2018 ◽

Vol 14 (6) ◽

pp. 683

Author(s):

Akanksha Upadhyaya ◽

Garima Srivastava ◽

Vinod Shokeen

Keyword(s):

Decision Tree ◽

Decision Tree Model ◽

Tree Model

Download Full-text

On Improving the Classification of Imbalanced Data

Cybernetics and Information Technologies ◽

10.1515/cait-2017-0004 ◽

2017 ◽

Vol 17 (1) ◽

pp. 45-62 ◽

Cited By ~ 1

Author(s):

Lincy Meera Mathews ◽

Hari Seetha

Keyword(s):

Nearest Neighbor ◽

Imbalanced Data ◽

Experimental Results ◽

Minority Class ◽

Nearest Neighbor Classifier ◽

Neighbor Classifier

Abstract Mining of imbalanced data isachallenging task due to its complex inherent characteristics. The conventional classifiers such as the nearest neighbor severely bias towards the majority class, as minority class data are under-represented and outnumbered. This paper focuses on building an improved Nearest Neighbor Classifier foratwo class imbalanced data. Three oversampling techniques are presented, for generation of artificial instances for the minority class for balancing the distribution among the classes. Experimental results showed that the proposed methods outperformed the conventional classifier.

Download Full-text