On the Suitability of Combining Feature Selection and Resampling to Manage Data Complexity

Highlight choice might be significant as data is made ceaselessly and at a consistently developing charge, it decreases the extreme dimensionality of certain issues. Highlight decision as a pre-preparing venture to gadget acing, is groundbreaking in bringing down repetition, getting rid of unessential records, developing picking up learning of exactness, and improving final product fathom ability. This work offers far reaching strategy to work decision inside the extent of classification issues, clarifying the principles, genuine application issues, etc inside the setting of over the top dimensional records. To begin with, we consideration on the possibility of trademark decision gives an examination on history and essential standards. We advocate quick sub sampling calculations to effectually rough the most extreme shot gauge in strategic relapse. We initially build up consistency and asymptotic ordinariness of the estimator from a well known sub sampling calculation, and afterward determine choicest sub sampling probabilities that limit the asymptotic suggest squared blunder of the subsequent estimator. An open door minimization standard is additionally proposed to additionally diminish the computational esteem. The best sub sampling chances rely on the all out data gauge, so we increment a - step set of guidelines to inexact the perfect sub sampling strategy. This arrangement of guidelines is computationally effective and has a gigantic markdown in figuring time contrasted with the entire insights technique. Consistency and asymptotic typicality of the estimator from a two-advance arrangement of principles are likewise mounted. Fake and real data units are utilized to assess the pragmatic generally execution of the proposed system.

Download Full-text

Revisiting Feature Selection with Data Complexity

2020 IEEE 20th International Conference on Bioinformatics and Bioengineering (BIBE) ◽

10.1109/bibe50027.2020.00042 ◽

2020 ◽

Author(s):

Ngan Thi Dong ◽

Megha Khosla

Keyword(s):

Feature Selection ◽

Data Complexity

Download Full-text

Centralized vs. distributed feature selection methods based on data complexity measures

Knowledge-Based Systems ◽

10.1016/j.knosys.2016.09.022 ◽

2017 ◽

Vol 117 ◽

pp. 27-45 ◽

Cited By ~ 24

Author(s):

L. Morán-Fernández ◽

V. Bolón-Canedo ◽

A. Alonso-Betanzos

Keyword(s):

Feature Selection ◽

Data Complexity ◽

Selection Methods ◽

Complexity Measures

Download Full-text

Revisiting Feature Selection with Data Complexity

10.1101/754630 ◽

2019 ◽

Cited By ~ 1

Author(s):

Ngan Thi Dong ◽

Megha Khosla

Keyword(s):

Feature Selection ◽

Biomedical Applications ◽

Nearest Neighbor ◽

Optimal Number ◽

Data Complexity ◽

Selection Methods ◽

Major Research ◽

Minimal Set ◽

Complexity Measures ◽

Highly Correlated

AbstractThe identification of biomarkers or predictive features that are indicative of a specific biological or disease state is a major research topic in biomedical applications. Several feature selection(FS) methods ranging from simple univariate methods to recent deep-learning methods have been proposed to select a minimal set of the most predictive features. However, there still lacks the answer to the question of “which method to use when”. In this paper, we study the performance of feature selection methods with respect to the underlying datasets’ statistics and their data complexity measures. We perform a comparative study of 11 feature selection methods over 27 publicly available datasets evaluated over a range of number of selected features using classification as the downstream task. We take the first step towards understanding the FS method’s performance from the viewpoint of data complexity. Specifically, we (empirically) show that as regard to classification, the performance of all studied feature selection methods is highly correlated with the error rate of a nearest neighbor based classifier. We also argue about the non-suitability of studied complexity measures to determine the optimal number of relevant features. While looking closely at several other aspects, we also provide recommendations for choosing a particular FS method for a given dataset.

Download Full-text

Feature Selection Strategies and Perceptual Expertise in Configuration Search Tasks

PsycEXTRA Dataset ◽

10.1037/e520602012-523 ◽

2011 ◽

Cited By ~ 1

Author(s):

Lindsey M. Kitchell ◽

Francisco J. Parada ◽

Brandi L. Emerick ◽

Tom A. Busey

Keyword(s):

Feature Selection ◽

Perceptual Expertise ◽

Search Tasks ◽

Selection Strategies

Download Full-text

On the Suitability of Combining Feature Selection and Resampling to Manage Data Complexity

Feature Extraction and Feature Selection : Reducing Data Complexity with Apache Spark

Reducing Data Complexity in Feature Extraction and Feature Selection for Big Data Security Analytics

Feature Extraction and Feature Selection: Reducing Data Complexity With Apache Spark

Data complexity measures in feature selection

Using Data Complexity Measures for Thresholding in Feature Selection Rankers

An Efficeint Feature Selection from Hetrogenous Data with Reduced Data Complexity

Revisiting Feature Selection with Data Complexity

Centralized vs. distributed feature selection methods based on data complexity measures

Revisiting Feature Selection with Data Complexity

Feature Selection Strategies and Perceptual Expertise in Configuration Search Tasks

Export Citation Format