imbalanced datasets Latest Research Papers

Synthetic Minority Over-sampling TEchnique (SMOTE) and Logistic Model Tree (LMT)-Adaptive Boosting algorithms for classifying imbalanced datasets of nutrient and chlorophyll sufficiency levels of oil palm (Elaeis guineensis) using spectroradiometers and unmanned aerial vehicles

Computers and Electronics in Agriculture ◽

10.1016/j.compag.2021.106646 ◽

2022 ◽

Vol 193 ◽

pp. 106646

Author(s):

Amiratul Diyana Amirruddin ◽

Farrah Melissa Muharam ◽

Mohd Hasmadi Ismail ◽

Ngai Paing Tan ◽

Mohd Firdaus Ismail

Keyword(s):

Unmanned Aerial Vehicles ◽

Oil Palm ◽

Logistic Model ◽

Elaeis Guineensis ◽

Sampling Technique ◽

Imbalanced Datasets ◽

Adaptive Boosting ◽

Aerial Vehicles ◽

Boosting Algorithms ◽

Logistic Model Tree

RDPVR: Random Data Partitioning with Voting Rule for Machine Learning from Class-Imbalanced Datasets

Electronics ◽

10.3390/electronics11020228 ◽

2022 ◽

Vol 11 (2) ◽

pp. 228

Author(s):

Ahmad B. Hassanat ◽

Ahmad S. Tarawneh ◽

Samer Subhi Abed ◽

Ghada Awad Altarawneh ◽

Malek Alrashidi ◽

...

Keyword(s):

Machine Learning ◽

Linear Time ◽

Class Imbalance ◽

Data Partitioning ◽

Majority Voting ◽

Random Data ◽

Imbalanced Datasets ◽

Resampling Methods ◽

Voting Rule ◽

Probability Of Overfitting

Since most classifiers are biased toward the dominant class, class imbalance is a challenging problem in machine learning. The most popular approaches to solving this problem include oversampling minority examples and undersampling majority examples. Oversampling may increase the probability of overfitting, whereas undersampling eliminates examples that may be crucial to the learning process. We present a linear time resampling method based on random data partitioning and a majority voting rule to address both concerns, where an imbalanced dataset is partitioned into a number of small subdatasets, each of which must be class balanced. After that, a specific classifier is trained for each subdataset, and the final classification result is established by applying the majority voting rule to the results of all of the trained models. We compared the performance of the proposed method to some of the most well-known oversampling and undersampling methods, employing a range of classifiers, on 33 benchmark machine learning class-imbalanced datasets. The classification results produced by the classifiers employed on the generated data by the proposed method were comparable to most of the resampling methods tested, with the exception of SMOTEFUNA, which is an oversampling method that increases the probability of overfitting. The proposed method produced results that were comparable to the Easy Ensemble (EE) undersampling method. As a result, for solving the challenge of machine learning from class-imbalanced datasets, we advocate using either EE or our method.

Deep learning approach for defective spot welds classification using small and class-imbalanced datasets

Neurocomputing ◽

10.1016/j.neucom.2022.01.004 ◽

2022 ◽

Author(s):

Wei Dai ◽

Dayong Li ◽

Ding Tang ◽

Huamiao Wang ◽

Yinghong Peng

Keyword(s):

Deep Learning ◽

Learning Approach ◽

Spot Welds ◽

Imbalanced Datasets

Hybrid Chicken Swarm Optimization (CSO) and Fuzzy Logic (FL) Model for Handling Imbalanced Datasets

International Journal of Intelligent Engineering and Systems ◽

10.22266/ijies2021.1231.02 ◽

2021 ◽

Vol 14 (6) ◽

pp. 10-19

Keyword(s):

Fuzzy Logic ◽

Imbalanced Datasets ◽

Swarm Optimization ◽

Chicken Swarm Optimization

Weighting Methods for Rare Event Identification From Imbalanced Datasets

Frontiers in Big Data ◽

10.3389/fdata.2021.715320 ◽

2021 ◽

Vol 4 ◽

Author(s):

Jia He ◽

Maggie X. Cheng

Keyword(s):

Machine Learning ◽

False Positive Rate ◽

Rare Event ◽

Poor Performance ◽

True Positive Rate ◽

Grid Data ◽

Imbalanced Datasets ◽

Weighting Method ◽

Main Class ◽

Positive Rate

In machine learning, we often face the situation where the event we are interested in has very few data points buried in a massive amount of data. This is typical in network monitoring, where data are streamed from sensing or measuring units continuously but most data are not for events. With imbalanced datasets, the classifiers tend to be biased in favor of the main class. Rare event detection has received much attention in machine learning, and yet it is still a challenging problem. In this paper, we propose a remedy for the standing problem. Weighting and sampling are two fundamental approaches to address the problem. We focus on the weighting method in this paper. We first propose a boosting-style algorithm to compute class weights, which is proved to have excellent theoretical property. Then we propose an adaptive algorithm, which is suitable for real-time applications. The adaptive nature of the two algorithms allows a controlled tradeoff between true positive rate and false positive rate and avoids excessive weight on the rare class, which leads to poor performance on the main class. Experiments on power grid data and some public datasets show that the proposed algorithms outperform the existing weighting and boosting methods, and that their superiority is more noticeable with noisy data.

HAR: Hardness Aware Reweighting for Imbalanced Datasets

10.1109/bigdata52589.2021.9671807 ◽

2021 ◽

Author(s):

Rahul Duggal ◽

Scott Freitas ◽

Sunny Dhamnani ◽

Duen Horng Chau ◽

Jimeng Sun

Keyword(s):

Imbalanced Datasets

Prediction of Adult Chronic Kidney Disease with Class-Imbalanced Datasets

10.1109/bigdata52589.2021.9671787 ◽

2021 ◽

Author(s):

Zhu-Cuiling ◽

Yuan-Jiucun ◽

Yang-Chengwei

Keyword(s):

Chronic Kidney Disease ◽

Kidney Disease ◽

Imbalanced Datasets

A Novel Oversampling Method for Imbalanced Datasets Based on Density Peaks Clustering

Tehnicki vjesnik - Technical Gazette ◽

10.17559/tv-20210608123522 ◽

2021 ◽

Vol 28 (6) ◽

Keyword(s):

Imbalanced Datasets ◽

Density Peaks ◽

Density Peaks Clustering

KNNOR: An oversampling technique for imbalanced datasets

Applied Soft Computing ◽

10.1016/j.asoc.2021.108288 ◽

2021 ◽

pp. 108288

Author(s):

Ashhadul Islam ◽

Samir Brahim Belhaouari ◽

Atiq Ur Rahman ◽

Halima Bensmail

Keyword(s):

Imbalanced Datasets

ImbTreeEntropy: An R package for building entropy-based classification trees on imbalanced datasets

SoftwareX ◽

10.1016/j.softx.2021.100841 ◽

2021 ◽

Vol 16 ◽

pp. 100841

Author(s):

Krzysztof Gajowniczek ◽

Tomasz Ząbkowski

Keyword(s):

Classification Trees ◽

R Package ◽

Imbalanced Datasets

imbalanced datasets
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

Synthetic Minority Over-sampling TEchnique (SMOTE) and Logistic Model Tree (LMT)-Adaptive Boosting algorithms for classifying imbalanced datasets of nutrient and chlorophyll sufficiency levels of oil palm (Elaeis guineensis) using spectroradiometers and unmanned aerial vehicles

RDPVR: Random Data Partitioning with Voting Rule for Machine Learning from Class-Imbalanced Datasets

Deep learning approach for defective spot welds classification using small and class-imbalanced datasets

Hybrid Chicken Swarm Optimization (CSO) and Fuzzy Logic (FL) Model for Handling Imbalanced Datasets

Weighting Methods for Rare Event Identification From Imbalanced Datasets

HAR: Hardness Aware Reweighting for Imbalanced Datasets

Prediction of Adult Chronic Kidney Disease with Class-Imbalanced Datasets

A Novel Oversampling Method for Imbalanced Datasets Based on Density Peaks Clustering

KNNOR: An oversampling technique for imbalanced datasets

ImbTreeEntropy: An R package for building entropy-based classification trees on imbalanced datasets

Export Citation Format

imbalanced datasetsRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

Synthetic Minority Over-sampling TEchnique (SMOTE) and Logistic Model Tree (LMT)-Adaptive Boosting algorithms for classifying imbalanced datasets of nutrient and chlorophyll sufficiency levels of oil palm (Elaeis guineensis) using spectroradiometers and unmanned aerial vehicles

RDPVR: Random Data Partitioning with Voting Rule for Machine Learning from Class-Imbalanced Datasets

Deep learning approach for defective spot welds classification using small and class-imbalanced datasets

Hybrid Chicken Swarm Optimization (CSO) and Fuzzy Logic (FL) Model for Handling Imbalanced Datasets

Weighting Methods for Rare Event Identification From Imbalanced Datasets

HAR: Hardness Aware Reweighting for Imbalanced Datasets

Prediction of Adult Chronic Kidney Disease with Class-Imbalanced Datasets

A Novel Oversampling Method for Imbalanced Datasets Based on Density Peaks Clustering

KNNOR: An oversampling technique for imbalanced datasets

ImbTreeEntropy: An R package for building entropy-based classification trees on imbalanced datasets

imbalanced datasets
Recently Published Documents