scholarly journals A Master Slave Parallel Genetic Algorithm for Feature Selection in High Dimensional Datasets

Feature Selection in High Dimensional Datasets is a combinatorial problem as it selects the optimal subsets from N dimensional data having 2N possible subsets. Genetic Algorithms are generally a good choice for feature selection in large datasets, though for some high dimensional problems it may take varied amount of time - few seconds, few hours or even few days. Therefore, it is important to use Genetic Algorithms that can give quality results in reasonably acceptable time limit. For this purpose, it is becoming necessary to implement Genetic Algorithms in an efficient manner. In this paper, a Master Slave Parallel Genetic Algorithm is implemented as a Feature Selection procedure to diminish the time intricacies of sequential genetic algorithm. This paper describes the speed gains in parallel Master-Slave Genetic Algorithm and also discusses the theoretical analysis of optimal number of slaves required for an efficient master slave implementation. The experiments are performed on three high-dimensional gene expression data. As Genetic Algorithm is a wrapper technique and takes more time to find the importance of any feature, Information Gain technique is used first as pre-processing task to remove the irrelevant features.

Author(s):  
Iwan Syarif

Classification problem especially for high dimensional datasets have attracted many researchers in order to find efficient approaches to address them. However, the classification problem has become very complicatedespecially when the number of possible different combinations of variables is so high. In this research, we evaluate the performance of Genetic Algorithm (GA) and Particle Swarm Optimization (PSO) as feature selection algorithms when applied to high dimensional datasets.Our experiments show that in terms of dimensionality reduction, PSO is much better than GA. PSO has successfully reduced the number of attributes of 8 datasets to 13.47% on average while GA is only 31.36% on average. In terms of classification performance, GA is slightly better than PSO. GA‐ reduced datasets have better performance than their original ones on 5 of 8 datasets while PSO is only 3 of 8 datasets.Keywords: feature selection, dimensionality reduction, Genetic Algorithm (GA), Particle Swarm Optmization (PSO).


2020 ◽  
Vol 43 (1) ◽  
pp. 103-125
Author(s):  
Yi Zhong ◽  
Jianghua He ◽  
Prabhakar Chalise

With the advent of high throughput technologies, the high-dimensional datasets are increasingly available. This has not only opened up new insight into biological systems but also posed analytical challenges. One important problem is the selection of informative feature-subset and prediction of the future outcome. It is crucial that models are not overfitted and give accurate results with new data. In addition, reliable identification of informative features with high predictive power (feature selection) is of interests in clinical settings. We propose a two-step framework for feature selection and classification model construction, which utilizes a nested and repeated cross-validation method. We evaluated our approach using both simulated data and two publicly available gene expression datasets. The proposed method showed comparatively better predictive accuracy for new cases than the standard cross-validation method.


IEEE Access ◽  
2020 ◽  
Vol 8 ◽  
pp. 139512-139528
Author(s):  
Shuangjie Li ◽  
Kaixiang Zhang ◽  
Qianru Chen ◽  
Shuqin Wang ◽  
Shaoqiang Zhang

Author(s):  
Hisao Ishibuchi ◽  
◽  
Tomoharu Nakashima

This paper proposes a genetic-algorithm-based approach for finding a compact reference set in nearest neighbor classification. The reference set is designed by selecting a small number of reference patterns from a large number of training patterns using a genetic algorithm. The genetic algorithm also removes unnecessary features. The reference set in our nearest neighbor classification consists of selected patterns with selected features. A binary string is used for representing the inclusion (or exclusion) of each pattern and feature in the reference set. Our goal is to minimize the number of selected patterns, to minimize the number of selected features, and to maximize the classification performance of the reference set. Computer simulations on commonly used data sets examine the effectiveness of our approach.


Author(s):  
Tarik Eltaeib ◽  
Julius Dichter

This paper examines the correlation between numbers of computer cores in parallel genetic algorithms. The objective to determine the linear polynomial complementary equation in order represent the relation between number of parallel processing and optimum solutions. Model this relation as optimization function (f(x)) which able to produce many simulation results. F(x) performance is outperform genetic algorithms. Compression results between genetic algorithm and optimization function is done. Also the optimization function give model to speed up genetic algorithm. Optimization function is a complementary transformation which maps a TSP given to linear without changing the roots of the polynomials.


Sign in / Sign up

Export Citation Format

Share Document