scholarly journals On the Suitability of Combining Feature Selection and Resampling to Manage Data Complexity

Author(s):  
Raúl Martín-Félez ◽  
Ramón A. Mollineda

Highlight choice might be significant as data is made ceaselessly and at a consistently developing charge, it decreases the extreme dimensionality of certain issues. Highlight decision as a pre-preparing venture to gadget acing, is groundbreaking in bringing down repetition, getting rid of unessential records, developing picking up learning of exactness, and improving final product fathom ability. This work offers far reaching strategy to work decision inside the extent of classification issues, clarifying the principles, genuine application issues, etc inside the setting of over the top dimensional records. To begin with, we consideration on the possibility of trademark decision gives an examination on history and essential standards. We advocate quick sub sampling calculations to effectually rough the most extreme shot gauge in strategic relapse. We initially build up consistency and asymptotic ordinariness of the estimator from a well known sub sampling calculation, and afterward determine choicest sub sampling probabilities that limit the asymptotic suggest squared blunder of the subsequent estimator. An open door minimization standard is additionally proposed to additionally diminish the computational esteem. The best sub sampling chances rely on the all out data gauge, so we increment a - step set of guidelines to inexact the perfect sub sampling strategy. This arrangement of guidelines is computationally effective and has a gigantic markdown in figuring time contrasted with the entire insights technique. Consistency and asymptotic typicality of the estimator from a two-advance arrangement of principles are likewise mounted. Fake and real data units are utilized to assess the pragmatic generally execution of the proposed system.


2017 ◽  
Vol 117 ◽  
pp. 27-45 ◽  
Author(s):  
L. Morán-Fernández ◽  
V. Bolón-Canedo ◽  
A. Alonso-Betanzos

2019 ◽  
Author(s):  
Ngan Thi Dong ◽  
Megha Khosla

AbstractThe identification of biomarkers or predictive features that are indicative of a specific biological or disease state is a major research topic in biomedical applications. Several feature selection(FS) methods ranging from simple univariate methods to recent deep-learning methods have been proposed to select a minimal set of the most predictive features. However, there still lacks the answer to the question of “which method to use when”. In this paper, we study the performance of feature selection methods with respect to the underlying datasets’ statistics and their data complexity measures. We perform a comparative study of 11 feature selection methods over 27 publicly available datasets evaluated over a range of number of selected features using classification as the downstream task. We take the first step towards understanding the FS method’s performance from the viewpoint of data complexity. Specifically, we (empirically) show that as regard to classification, the performance of all studied feature selection methods is highly correlated with the error rate of a nearest neighbor based classifier. We also argue about the non-suitability of studied complexity measures to determine the optimal number of relevant features. While looking closely at several other aspects, we also provide recommendations for choosing a particular FS method for a given dataset.


Author(s):  
Lindsey M. Kitchell ◽  
Francisco J. Parada ◽  
Brandi L. Emerick ◽  
Tom A. Busey

Sign in / Sign up

Export Citation Format

Share Document