Scaling Up Feature Selection: A Distributed Filter Approach

Author(s):  
Verónica Bolón-Canedo ◽  
Noelia Sánchez-Maroño ◽  
Joana Cerviño-Rabuñal
Author(s):  
Ch. Sanjeev Kumar Dash ◽  
Ajit Kumar Behera ◽  
Sarat Chandra Nayak

This chapter presents a novel approach for classification of dataset by suitably tuning the parameters of radial basis function networks with an additional cost of feature selection. Inputting optimal and relevant set of features to a radial basis function may greatly enhance the network efficiency (in terms of accuracy) at the same time compact its size. In this chapter, the authors use information gain theory (a kind of filter approach) for reducing the features and differential evolution for tuning center and spread of radial basis functions. Different feature selection methods, handling missing values and removal of inconsistency to improve the classification accuracy of the proposed model are emphasized. The proposed approach is validated with a few benchmarking highly skewed and balanced dataset retrieved from University of California, Irvine (UCI) repository. The experimental study is encouraging to pursue further extensive research in highly skewed data.


Author(s):  
Barak Chizi ◽  
Lior Rokach ◽  
Oded Maimon

Dimensionality (i.e., the number of data set attributes or groups of attributes) constitutes a serious obstacle to the efficiency of most data mining algorithms (Maimon and Last, 2000). The main reason for this is that data mining algorithms are computationally intensive. This obstacle is sometimes known as the “curse of dimensionality” (Bellman, 1961). The objective of Feature Selection is to identify features in the data-set as important, and discard any other feature as irrelevant and redundant information. Since Feature Selection reduces the dimensionality of the data, data mining algorithms can be operated faster and more effectively by using Feature Selection. In some cases, as a result of feature selection, the performance of the data mining method can be improved. The reason for that is mainly a more compact, easily interpreted representation of the target concept. The filter approach (Kohavi , 1995; Kohavi and John ,1996) operates independently of the data mining method employed subsequently -- undesirable features are filtered out of the data before learning begins. These algorithms use heuristics based on general characteristics of the data to evaluate the merit of feature subsets. A sub-category of filter methods that will be refer to as rankers, are methods that employ some criterion to score each feature and provide a ranking. From this ordering, several feature subsets can be chosen by manually setting There are three main approaches for feature selection: wrapper, filter and embedded. The wrapper approach (Kohavi, 1995; Kohavi and John,1996), uses an inducer as a black box along with a statistical re-sampling technique such as cross-validation to select the best feature subset according to some predictive measure. The embedded approach (see for instance Guyon and Elisseeff, 2003) is similar to the wrapper approach in the sense that the features are specifically selected for a certain inducer, but it selects the features in the process of learning.


Author(s):  
Mekour Norreddine

One of the problems that gene expression data resolved is feature selection. There is an important process for choosing which features are important for prediction; there are two general approaches for feature selection: filter approach and wrapper approach. In this chapter, the authors combine the filter approach with method ranked information gain and wrapper approach with a searching method of the genetic algorithm. The authors evaluate their approach on two data sets of gene expression data: Leukemia, and the Central Nervous System. The classifier Decision tree (C4.5) is used for improving the classification performance.


2017 ◽  
Vol 72 ◽  
pp. 314-326 ◽  
Author(s):  
Saúl Solorio-Fernández ◽  
José Fco. Martínez-Trinidad ◽  
J. Ariel Carrasco-Ochoa

RSC Advances ◽  
2016 ◽  
Vol 6 (102) ◽  
pp. 99676-99684 ◽  
Author(s):  
Davor Antanasijević ◽  
Jelena Antanasijević ◽  
Viktor Pocajt ◽  
Gordana Ušćumlić

The QSPR study on transition temperatures of five-ring bent-core LCs was performed using GMDH-type neural networks. A novel multi-filter approach, which combines chi square ranking, v-WSH and GMDH algorithm was used for the selection of descriptors.


Sign in / Sign up

Export Citation Format

Share Document