ImbTree: Minority Class Sensitive Weighted Decision Tree for Classification of Unbalanced Data

In the case of extremely unbalanced data, the results of the traditional classification algorithm are very unbalanced, and most samples are often divided into the categories of majority samples, so the accuracy of judgment of the minority classes will be reduced. In this paper, we propose a classification algorithm for unbalanced data based on RSM and binomial undersampling. We use RSM’s random part features rather than all each classifier to make each training classifier reduce the dimensions, and dimension reduction makes relatively minority class samples indirectly lift. Using the above characteristics of the RSM to reduce dimension can solve the problem that unbalanced data classification in the minority class samples is too little, and it can also find the important attribute of variables to make the model have the ability of explanation. Experiments show that our algorithm has high classification accuracy and model interpretation ability when classifying unbalanced data.

Download Full-text

Classification of multiclass imbalanced data using cost-sensitive decision tree C5.0

IAES International Journal of Artificial Intelligence (IJ-AI) ◽

10.11591/ijai.v9.i1.pp65-72 ◽

2020 ◽

Vol 9 (1) ◽

pp. 65

Author(s):

M. Aldiki Febriantono ◽

Sholeh Hadi Pramono ◽

Rahmadwati Rahmadwati ◽

Golshah Naghdy

Keyword(s):

Decision Tree ◽

Cost Model ◽

Imbalanced Data ◽

Minimum Cost ◽

Information Value ◽

Tree Model ◽

Minority Class ◽

Classifier Performance ◽

The Cost

The multiclass imbalanced data problems in data mining were an interesting to study currently. The problems had an influence on the classification process in machine learning processes. Some cases showed that minority class in the dataset had an important information value compared to the majority class. When minority class was misclassification, it would affect the accuracy value and classifier performance. In this research, cost sensitive decision tree C5.0 was used to solve multiclass imbalanced data problems. The first stage, making the decision tree model uses the C5.0 algorithm then the cost sensitive learning uses the metacost method to obtain the minimum cost model. The results of testing the C5.0 algorithm had better performance than C4.5 and ID3 algorithms. The percentage of algorithm performance from C5.0, C4.5 and ID3 were 40.91%, 40, 24% and 19.23%.

Download Full-text

Multi-Label Classification with PSO based Synthetic Minority Over-Sampling Technique (Psosmote) for Imbalanced Samples

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.d8437.118419 ◽

2019 ◽

Vol 8 (4) ◽

pp. 4039-4042

Keyword(s):

Data Mining ◽

Sampling Rate ◽

Sampling Technique ◽

Unbalanced Data ◽

Optimal Sampling ◽

Minority Class ◽

Swarm Optimization ◽

F Measure ◽

Predictive Clustering Trees

Recently, the learning from unbalanced data has emerged to be a pre-dominant problem in several applications and in that multi label classification is an evolving data mining task, learning from unbalanced multilabel data is being examined. However, the available algorithms-based SMOTE makes use of the same sampling rate for every instance of the minority class. This leads to sub-optimal performance. To deal with this problem, a new Particle Swarm Optimization based SMOTE (PSOSMOTE) algorithm is proposed. The PSOSMOTE algorithm employs diverse sampling rates for multiple minority class instances and gets the fusion of optimal sampling rates and to deal with classification of unbalanced datasets. Then, Bayesian technique is combined with Random forest for multilabel classification (BARF-MLC) is to address the inherent label dependencies among samples such as ML-FOREST classifier, Predictive Clustering Trees (PCT), Hierarchy of Multi Label Classifier (HOMER) by taking the different metrics including precision, recall, F-measure, Accuracy and Error Rate.

Download Full-text

Adaptation Proposed Methods for Handling Imbalanced Datasets based on Over-Sampling Technique

Al-Mustansiriyah Journal of Science ◽

10.23851/mjs.v31i2.740 ◽

2020 ◽

Vol 31 (2) ◽

pp. 25

Author(s):

Liqaa M. Shoohi ◽

Jamila H. Saud

Keyword(s):

Neural Networks ◽

Decision Tree ◽

Back Propagation ◽

Imbalanced Data ◽

Sampling Technique ◽

Poor Performance ◽

Imbalanced Dataset ◽

Minority Class ◽

Data Result

Classification of imbalanced data is an important issue. Many algorithms have been developed for classification, such as Back Propagation (BP) neural networks, decision tree, Bayesian networks etc., and have been used repeatedly in many fields. These algorithms speak of the problem of imbalanced data, where there are situations that belong to more classes than others. Imbalanced data result in poor performance and bias to a class without other classes. In this paper, we proposed three techniques based on the Over-Sampling (O.S.) technique for processing imbalanced dataset and redistributing it and converting it into balanced dataset. These techniques are (Improved Synthetic Minority Over-Sampling Technique (Improved SMOTE), Borderline-SMOTE + Imbalanced Ratio(IR), Adaptive Synthetic Sampling (ADASYN) +IR) Algorithm, where the work these techniques are generate the synthetic samples for the minority class to achieve balance between minority and majority classes and then calculate the IR between classes of minority and majority. Experimental results show ImprovedSMOTE algorithm outperform the Borderline-SMOTE + IR and ADASYN + IR algorithms because it achieves a high balance between minority and majority classes.

Download Full-text

Decision tree classification of hyperspectral remote sensing imagery based on independent component analysis

Journal of Computer Applications ◽

10.3724/sp.j.1087.2012.00524 ◽

2013 ◽

Vol 32 (2) ◽

pp. 524-527

Author(s):

Zhi-lei LIN ◽

Lu-ming YAN

Keyword(s):

Remote Sensing ◽

Independent Component Analysis ◽

Decision Tree ◽

Component Analysis ◽

Hyperspectral Remote Sensing ◽

Independent Component ◽

Remote Sensing Imagery ◽

Decision Tree Classification

Download Full-text

Decision Tree C 4.5 Algorithm for Classification of Poor Family Scholarship Recipients

IOP Conference Series Materials Science and Engineering ◽

10.1088/1757-899x/1125/1/012048 ◽

2021 ◽

Vol 1125 (1) ◽

pp. 012048

Author(s):

Y Kustiyahningsih ◽

B K Khotimah ◽

D R Anamisa ◽

M Yusuf ◽

T Rahayu ◽

...

Keyword(s):

Decision Tree ◽

Poor Family ◽

Scholarship Recipients

Download Full-text

174 A comparison of machine learning algorithms in the classification of beef steers finished in feedlot

Journal of Animal Science ◽

10.1093/jas/skaa278.231 ◽

2020 ◽

Vol 98 (Supplement_4) ◽

pp. 126-127

Author(s):

Lucas S Lopes ◽

Christine F Baes ◽

Dan Tulpan ◽

Luis Artur Loyola Chardulo ◽

Otavio Machado Neto ◽

...

Keyword(s):

Machine Learning ◽

Decision Tree ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Final Decision ◽

Relevant Parameter ◽

Good Prediction ◽

Quality Traits ◽

C4.5 Decision Tree

Abstract The aim of this project is to compare some of the state-of-the-art machine learning algorithms on the classification of steers finished in feedlots based on performance, carcass and meat quality traits. The precise classification of animals allows for fast, real-time decision making in animal food industry, such as culling or retention of herd animals. Beef production presents high variability in its numerous carcass and beef quality traits. Machine learning algorithms and software provide an opportunity to evaluate the interactions between traits to better classify animals. Four different treatment levels of wet distiller’s grain were applied to 97 Angus-Nellore animals and used as features for the classification problem. The C4.5 decision tree, Naïve Bayes (NB), Random Forest (RF) and Multilayer Perceptron (MLP) Artificial Neural Network algorithms were used to predict and classify the animals based on recorded traits measurements, which include initial and final weights, sheer force and meat color. The top performing classifier was the C4.5 decision tree algorithm with a classification accuracy of 96.90%, while the RF, the MLP and NB classifiers had accuracies of 55.67%, 39.17% and 29.89% respectively. We observed that the final decision tree model constructed with C4.5 selected only the dry matter intake (DMI) feature as a differentiator. When DMI was removed, no other feature or combination of features was sufficiently strong to provide good prediction accuracies for any of the classifiers. We plan to investigate in a follow-up study on a significantly larger sample size, the reasons behind DMI being a more relevant parameter than the other measurements.

Download Full-text

Classification of diabetes disease using decision tree algorithm (C4.5)

Journal of Physics Conference Series ◽

10.1088/1742-6596/1869/1/012082 ◽

2021 ◽

Vol 1869 (1) ◽

pp. 012082

Author(s):

B A C Permana ◽

R Ahmad ◽

H Bahtiar ◽

A Sudianto ◽

I Gunawan

Keyword(s):

Decision Tree ◽

Decision Tree Algorithm ◽

Tree Algorithm

Download Full-text

Improved differentiation classification of variable precision artificial intelligence higher education management

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-219036 ◽

2021 ◽

pp. 1-10

Author(s):

Chao Dong ◽

Yan Guo

Keyword(s):

Artificial Intelligence ◽

Higher Education ◽

Data Mining ◽

Decision Tree ◽

Classification Accuracy ◽

Attribute Selection ◽

Higher Education Management ◽

Education Management ◽

Decision Tree Classification

The wide application of artificial intelligence technology in various fields has accelerated the pace of people exploring the hidden information behind large amounts of data. People hope to use data mining methods to conduct effective research on higher education management, and decision tree classification algorithm as a data analysis method in data mining technology, high-precision classification accuracy, intuitive decision results, and high generalization ability make it become a more ideal method of higher education management. Aiming at the sensitivity of data processing and decision tree classification to noisy data, this paper proposes corresponding improvements, and proposes a variable precision rough set attribute selection standard based on scale function, which considers both the weighted approximation accuracy and attribute value of the attribute. The number improves the anti-interference ability of noise data, reduces the bias in attribute selection, and improves the classification accuracy. At the same time, the suppression factor threshold, support and confidence are introduced in the tree pre-pruning process, which simplifies the tree structure. The comparative experiments on standard data sets show that the improved algorithm proposed in this paper is better than other decision tree algorithms and can effectively realize the differentiated classification of higher education management.

Download Full-text

PcHD: Personalized classification of heartbeat types using a decision tree

Computers in Biology and Medicine ◽

10.1016/j.compbiomed.2014.08.013 ◽

2014 ◽

Vol 54 ◽

pp. 79-88 ◽

Cited By ~ 18

Author(s):

Juyoung Park ◽

Kyungtae Kang

Keyword(s):

Decision Tree

Download Full-text