hybrid sampling
Recently Published Documents


TOTAL DOCUMENTS

100
(FIVE YEARS 33)

H-INDEX

10
(FIVE YEARS 2)

2022 ◽  
Vol 16 (3) ◽  
pp. 1-37
Author(s):  
Robert A. Sowah ◽  
Bernard Kuditchar ◽  
Godfrey A. Mills ◽  
Amevi Acakpovi ◽  
Raphael A. Twum ◽  
...  

Class imbalance problem is prevalent in many real-world domains. It has become an active area of research. In binary classification problems, imbalance learning refers to learning from a dataset with a high degree of skewness to the negative class. This phenomenon causes classification algorithms to perform woefully when predicting positive classes with new examples. Data resampling, which involves manipulating the training data before applying standard classification techniques, is among the most commonly used techniques to deal with the class imbalance problem. This article presents a new hybrid sampling technique that improves the overall performance of classification algorithms for solving the class imbalance problem significantly. The proposed method called the Hybrid Cluster-Based Undersampling Technique (HCBST) uses a combination of the cluster undersampling technique to under-sample the majority instances and an oversampling technique derived from Sigma Nearest Oversampling based on Convex Combination, to oversample the minority instances to solve the class imbalance problem with a high degree of accuracy and reliability. The performance of the proposed algorithm was tested using 11 datasets from the National Aeronautics and Space Administration Metric Data Program data repository and University of California Irvine Machine Learning data repository with varying degrees of imbalance. Results were compared with classification algorithms such as the K-nearest neighbours, support vector machines, decision tree, random forest, neural network, AdaBoost, naïve Bayes, and quadratic discriminant analysis. Tests results revealed that for the same datasets, the HCBST performed better with average performances of 0.73, 0.67, and 0.35 in terms of performance measures of area under curve, geometric mean, and Matthews Correlation Coefficient, respectively, across all the classifiers used for this study. The HCBST has the potential of improving the performance of the class imbalance problem, which by extension, will improve on the various applications that rely on the concept for a solution.


Author(s):  
Miftah Fauzi ◽  
Muhammad Aria Rajasa Pohan

Informed Rapidly-exploring Random Tree* (Informed-RRT*) merupakan hasil pengembangan dari algoritma Rapidly-exploring Random Tree (RRT) yang dapat menghasilkan solusi jalur yang bersifat asimptotik optimal tetapi waktu komputasi yang dibutuhkan menjadi lebih lama. Pada awalnya algoritma Informed-RRT* masih menggunakan metode random sampling yang mana metode ini akan mengambil sampel acak pada ruang pencarian. Pengambilan sampel acak inilah yang akan membuat waktu komputasi menjadi tidak optimal. Penelitian ini bertujuan untuk merancang algoritma Informed-RRT* menggunakan metode hybrid sampling. Metode hybrid sampling merupakan integrasi dari beberapa metode pengambilan sampel. Pada pengujian ini, performansi metode random sampling akan dibandingkan dengan perfomansi metode hybrid sampling dalam hal waktu komputasi. Pengujian metode hybrid sampling pada algoritma Informed-RRT* ini berbasis simulasi dan dilakukan pada lingkungan narrow, clutter, trap. Hasil yang didapatkan dari pengujian ini adalah penggunaan metode hybrid sampling pada algoritma Informed-RRT* mampu menghasilkan performansi waktu rata- rata komputasi yang lebih cepat 26,4 detik bila dibandingkan dengan metode random sampling pada lingkungan clutter. Pada lingkungan narrow metode hybrid sampling menghasilkan waktu komputasi 24,52 detik lebih cepat bila dibandingkan dengan metode random sampling. Pada lingkungan trap metode hybrid sampling menghasilkan waktu komputasi 5,25 detik lebih cepat dibandingkan dengan metode random sampling. Dari data hasil pengujian, metode hybrid sampling ini dapat menjadi metode pangambilan sampel alternatif untuk digunakan pada algoritma Informed-RRT*


2021 ◽  
Author(s):  
Xuanrui Xiong ◽  
Yang Huang ◽  
Yuan Zhang ◽  
Fan Zhang ◽  
Yumei Jia ◽  
...  

Author(s):  
Khyati Ahlawat ◽  
Anuradha Chug ◽  
Amit Prakash Singh

The uneven distribution of classes in any dataset poses a tendency of biasness toward the majority class when analyzed using any standard classifier. The instances of the significant class being deficient in numbers are generally ignored and their correct classification which is of paramount interest is often overlooked in calculating overall accuracy. Therefore, the conventional machine learning approaches are rigorously refined to address this class imbalance problem. This challenge of imbalanced classes is more prevalent in big data scenario due to its high volume. This study deals with acknowledging a sampling solution based on cluster computing in handling class imbalance problems in the case of big data. The newly proposed approach hybrid sampling algorithm (HSA) is assessed using three popular classification algorithms namely, support vector machine, decision tree and k-nearest neighbor based on balanced accuracy and elapsed time. The results obtained from the experiment are considered promising with an efficiency gain of 42% in comparison to the traditional sampling solution synthetic minority oversampling technique (SMOTE). This work proves the effectiveness of the distribution and clustering principle in imbalanced big data scenarios.


Complexity ◽  
2021 ◽  
Vol 2021 ◽  
pp. 1-9
Author(s):  
Liping Chen ◽  
Jiabao Jiang ◽  
Yong Zhang

The classical classifiers are ineffective in dealing with the problem of imbalanced big dataset classification. Resampling the datasets and balancing samples distribution before training the classifier is one of the most popular approaches to resolve this problem. An effective and simple hybrid sampling method based on data partition (HSDP) is proposed in this paper. First, all the data samples are partitioned into different data regions. Then, the data samples in the noise minority samples region are removed and the samples in the boundary minority samples region are selected as oversampling seeds to generate the synthetic samples. Finally, a weighted oversampling process is conducted considering the generation of synthetic samples in the same cluster of the oversampling seed. The weight of each selected minority class sample is computed by the ratio between the proportion of majority class in the neighbors of this selected sample and the sum of all these proportions. Generation of synthetic samples in the same cluster of the oversampling seed guarantees new synthetic samples located inside the minority class area. Experiments conducted on eight datasets show that the proposed method, HSDP, is better than or comparable with the typical sampling methods for F-measure and G-mean.


2021 ◽  
Author(s):  
Yanran Ding ◽  
Mengchao Zhang ◽  
Chuanzheng Li ◽  
Hae-Won Park ◽  
Kris Hauser

Sign in / Sign up

Export Citation Format

Share Document