Instance Selection Methods and Resampling Techniques for Dissimilarity Representation with Imbalanced Data Sets

Author(s):  
M. Millán-Giraldo ◽  
V. García ◽  
J. S. Sánchez
2013 ◽  
Vol 43 (1) ◽  
pp. 332-346 ◽  
Author(s):  
Nicolas Garcia-Pedrajas ◽  
Javier Pérez-Rodríguez ◽  
Aida de Haro-García

2019 ◽  
Vol 28 (01) ◽  
pp. 1950001 ◽  
Author(s):  
Zeinab Abbasi ◽  
Mohsen Rahmani

Due to the increasing growth of data, many methods are proposed to extract useful data and remove noisy data. Instance selection is one of these methods which selects some instances of a data set and removes others. This paper proposes a new instance selection algorithm based on ReliefF, which is a feature selection algorithm. In the proposed algorithm, based on the Jaccard index, the nearest instances of each class are found for each instance. Then, based on the nearest neighbor’s set, the weight of each instance is calculated. Finally, only instances with more weights are selected. This algorithm can reduce data at a specified rate and have the ability to run parallel on the instances. It can work on a variety of data sets with nominal and numeric data with missing values and is also suitable for working with imbalanced data sets. The proposed algorithm tests on three data sets. Results show that the proposed algorithm can reduce the volume of data, without a significant change in classification accuracy of these datasets.


2013 ◽  
Vol 756-759 ◽  
pp. 3652-3658
Author(s):  
You Li Lu ◽  
Jun Luo

Under the study of Kernel Methods, this paper put forward two improved algorithm which called R-SVM & I-SVDD in order to cope with the imbalanced data sets in closed systems. R-SVM used K-means algorithm clustering space samples while I-SVDD improved the performance of original SVDD by imbalanced sample training. Experiment of two sets of system call data set shows that these two algorithms are more effectively and R-SVM has a lower complexity.


Sign in / Sign up

Export Citation Format

Share Document