scholarly journals Classification of multiclass imbalanced data using cost-sensitive decision tree C5.0

Author(s):  
M. Aldiki Febriantono ◽  
Sholeh Hadi Pramono ◽  
Rahmadwati Rahmadwati ◽  
Golshah Naghdy

The multiclass imbalanced data problems in data mining were an interesting to study currently. The problems had an influence on the classification process in machine learning processes. Some cases showed that minority class in the dataset had an important information value compared to the majority class. When minority class was misclassification, it would affect the accuracy value and classifier performance. In this research, cost sensitive decision tree C5.0 was used to solve multiclass imbalanced data problems. The first stage, making the decision tree model uses the C5.0 algorithm then the cost sensitive learning uses the metacost method to obtain the minimum cost model. The results of testing the C5.0 algorithm had better performance than C4.5 and ID3 algorithms. The percentage of algorithm performance from C5.0, C4.5 and ID3 were 40.91%, 40, 24% and 19.23%.

2020 ◽  
Vol 31 (2) ◽  
pp. 25
Author(s):  
Liqaa M. Shoohi ◽  
Jamila H. Saud

Classification of imbalanced data is an important issue. Many algorithms have been developed for classification, such as Back Propagation (BP) neural networks, decision tree, Bayesian networks etc., and have been used repeatedly in many fields. These algorithms speak of the problem of imbalanced data, where there are situations that belong to more classes than others. Imbalanced data result in poor performance and bias to a class without other classes. In this paper, we proposed three techniques based on the Over-Sampling (O.S.) technique for processing imbalanced dataset and redistributing it and converting it into balanced dataset. These techniques are (Improved Synthetic Minority Over-Sampling Technique (Improved SMOTE),  Borderline-SMOTE + Imbalanced Ratio(IR), Adaptive Synthetic Sampling (ADASYN) +IR) Algorithm, where the work these techniques are generate the synthetic samples for the minority class to achieve balance between minority and majority classes and then calculate the IR between classes of minority and majority. Experimental results show ImprovedSMOTE algorithm outperform the Borderline-SMOTE + IR and ADASYN + IR algorithms because it achieves a high balance between minority and majority classes.


2018 ◽  
Vol 2018 ◽  
pp. 1-15 ◽  
Author(s):  
Huaping Guo ◽  
Xiaoyu Diao ◽  
Hongbing Liu

Rotation Forest is an ensemble learning approach achieving better performance comparing to Bagging and Boosting through building accurate and diverse classifiers using rotated feature space. However, like other conventional classifiers, Rotation Forest does not work well on the imbalanced data which are characterized as having much less examples of one class (minority class) than the other (majority class), and the cost of misclassifying minority class examples is often much more expensive than the contrary cases. This paper proposes a novel method called Embedding Undersampling Rotation Forest (EURF) to handle this problem (1) sampling subsets from the majority class and learning a projection matrix from each subset and (2) obtaining training sets by projecting re-undersampling subsets of the original data set to new spaces defined by the matrices and constructing an individual classifier from each training set. For the first method, undersampling is to force the rotation matrix to better capture the features of the minority class without harming the diversity between individual classifiers. With respect to the second method, the undersampling technique aims to improve the performance of individual classifiers on the minority class. The experimental results show that EURF achieves significantly better performance comparing to other state-of-the-art methods.


2012 ◽  
Vol 33 (2) ◽  
pp. 152-159 ◽  
Author(s):  
Xan F. Courville ◽  
Ivan M. Tomek ◽  
Kathryn B. Kirkland ◽  
Marian Birhle ◽  
Stephen R. Kantor ◽  
...  

Objective.To perform a cost-effectiveness analysis to evaluate preoperative use of mupirocin in patients with total joint arthroplasty (TJA).Design.Simple decision tree model.Setting.Outpatient TJA clinical setting.Participants.Hypothetical cohort of patients with TJA.Interventions.A simple decision tree model compared 3 strategies in a hypothetical cohort of patients with TJA: (1) obtaining preoperative screening cultures for all patients, followed by administration of mupirocin to patients with cultures positive for Staphylococcus aureus; (2) providing empirical preoperative treatment with mupirocin for all patients without screening; and (3) providing no preoperative treatment or screening. We assessed the costs and benefits over a 1-year period. Data inputs were obtained from a literature review and from our institution's internal data. Utilities were measured in quality-adjusted life-years, and costs were measured in 2005 US dollars.Main Outcome Measure.Incremental cost-effectiveness ratio.Results.The treat-all and screen-and-treat strategies both had lower costs and greater benefits, compared with the no-treatment strategy. Sensitivity analysis revealed that this result is stable even if the cost of mupirocin was over $100 and the cost of SSI ranged between $26,000 and $250,000. Treating all patients remains the best strategy when the prevalence of S. aureus carriers and surgical site infection is varied across plausible values as well as when the prevalence of mupirocin-resistant strains is high.Conclusions.Empirical treatment with mupirocin ointment or use of a screen-and-treat strategy before TJA is performed is a simple, safe, and cost-effective intervention that can reduce the risk of SSI. S. aureus decolonization with nasal mupirocin for patients undergoing TJA should be considered.Level of Evidence.Level II, economic and decision analysis.Infect Control Hosp Epidemiol 2012;33(2):152-159


2002 ◽  
Vol 16 ◽  
pp. 321-357 ◽  
Author(s):  
N. V. Chawla ◽  
K. W. Bowyer ◽  
L. O. Hall ◽  
W. P. Kegelmeyer

An approach to the construction of classifiers from imbalanced datasets is described. A dataset is imbalanced if the classification categories are not approximately equally represented. Often real-world data sets are predominately composed of ``normal'' examples with only a small percentage of ``abnormal'' or ``interesting'' examples. It is also the case that the cost of misclassifying an abnormal (interesting) example as a normal example is often much higher than the cost of the reverse error. Under-sampling of the majority (normal) class has been proposed as a good means of increasing the sensitivity of a classifier to the minority class. This paper shows that a combination of our method of over-sampling the minority (abnormal) class and under-sampling the majority (normal) class can achieve better classifier performance (in ROC space) than only under-sampling the majority class. This paper also shows that a combination of our method of over-sampling the minority class and under-sampling the majority class can achieve better classifier performance (in ROC space) than varying the loss ratios in Ripper or class priors in Naive Bayes. Our method of over-sampling the minority class involves creating synthetic minority class examples. Experiments are performed using C4.5, Ripper and a Naive Bayes classifier. The method is evaluated using the area under the Receiver Operating Characteristic curve (AUC) and the ROC convex hull strategy.


Author(s):  
Yilin Yan ◽  
Min Chen ◽  
Saad Sadiq ◽  
Mei-Ling Shyu

The classification of imbalanced datasets has recently attracted significant attention due to its implications in several real-world use cases. The classifiers developed on datasets with skewed distributions tend to favor the majority classes and are biased against the minority class. Despite extensive research interests, imbalanced data classification remains a challenge in data mining research, especially for multimedia data. Our attempt to overcome this hurdle is to develop a convolutional neural network (CNN) based deep learning solution integrated with a bootstrapping technique. Considering that convolutional neural networks are very computationally expensive coupled with big training datasets, we propose to extract features from pre-trained convolutional neural network models and feed those features to another full connected neutral network. Spark implementation shows promising performance of our model in handling big datasets with respect to feasibility and scalability.


2017 ◽  
Vol 17 (1) ◽  
pp. 45-62 ◽  
Author(s):  
Lincy Meera Mathews ◽  
Hari Seetha

Abstract Mining of imbalanced data isachallenging task due to its complex inherent characteristics. The conventional classifiers such as the nearest neighbor severely bias towards the majority class, as minority class data are under-represented and outnumbered. This paper focuses on building an improved Nearest Neighbor Classifier foratwo class imbalanced data. Three oversampling techniques are presented, for generation of artificial instances for the minority class for balancing the distribution among the classes. Experimental results showed that the proposed methods outperformed the conventional classifier.


Sign in / Sign up

Export Citation Format

Share Document