A Master Slave Parallel Genetic Algorithm for Feature Selection in High Dimensional Datasets

doi:10.35940/ijrte.c4184.098319

A Master Slave Parallel Genetic Algorithm for Feature Selection in High Dimensional Datasets

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.c4184.098319 ◽

2019 ◽

Vol 8 (3) ◽

pp. 379-384

Keyword(s):

Genetic Algorithm ◽

Genetic Algorithms ◽

Feature Selection ◽

Information Gain ◽

Optimal Number ◽

Good Choice ◽

High Dimensional ◽

Parallel Genetic Algorithm ◽

Efficient Manner ◽

High Dimensional Datasets

Feature Selection in High Dimensional Datasets is a combinatorial problem as it selects the optimal subsets from N dimensional data having 2N possible subsets. Genetic Algorithms are generally a good choice for feature selection in large datasets, though for some high dimensional problems it may take varied amount of time - few seconds, few hours or even few days. Therefore, it is important to use Genetic Algorithms that can give quality results in reasonably acceptable time limit. For this purpose, it is becoming necessary to implement Genetic Algorithms in an efficient manner. In this paper, a Master Slave Parallel Genetic Algorithm is implemented as a Feature Selection procedure to diminish the time intricacies of sequential genetic algorithm. This paper describes the speed gains in parallel Master-Slave Genetic Algorithm and also discusses the theoretical analysis of optimal number of slaves required for an efficient master slave implementation. The experiments are performed on three high-dimensional gene expression data. As Genetic Algorithm is a wrapper technique and takes more time to find the importance of any feature, Information Gain technique is used first as pre-processing task to remove the irrelevant features.

Download Full-text

Dimensionality Reduction Algorithms on High Dimensional Datasets

EMITTER International Journal of Engineering Technology ◽

10.24003/emitter.v2i2.24 ◽

2014 ◽

Vol 2 (2) ◽

Cited By ~ 3

Author(s):

Iwan Syarif

Keyword(s):

Genetic Algorithm ◽

Feature Selection ◽

Dimensionality Reduction ◽

Particle Swarm ◽

Classification Problem ◽

Classification Performance ◽

High Dimensional ◽

Selection Algorithms ◽

High Dimensional Datasets ◽

Better Than

Classification problem especially for high dimensional datasets have attracted many researchers in order to find efficient approaches to address them. However, the classification problem has become very complicatedespecially when the number of possible different combinations of variables is so high. In this research, we evaluate the performance of Genetic Algorithm (GA) and Particle Swarm Optimization (PSO) as feature selection algorithms when applied to high dimensional datasets.Our experiments show that in terms of dimensionality reduction, PSO is much better than GA. PSO has successfully reduced the number of attributes of 8 datasets to 13.47% on average while GA is only 31.36% on average. In terms of classification performance, GA is slightly better than PSO. GA‐ reduced datasets have better performance than their original ones on 5 of 8 datasets while PSO is only 3 of 8 datasets.Keywords: feature selection, dimensionality reduction, Genetic Algorithm (GA), Particle Swarm Optmization (PSO).

Download Full-text

Incorporating genetic algorithm into rough feature selection for high dimensional biomedical data

2011 IEEE International Symposium on IT in Medicine and Education ◽

10.1109/itime.2011.6132040 ◽

2011 ◽

Cited By ~ 1

Author(s):

Vinh Quoc Dang ◽

Chiou-Peng Lam ◽

Chang Su Lee

Keyword(s):

Genetic Algorithm ◽

Feature Selection ◽

High Dimensional ◽

Biomedical Data ◽

Selection For

Download Full-text

Feature Selection in High Dimensional Data by a Filter-Based Genetic Algorithm

Applications of Evolutionary Computation - Lecture Notes in Computer Science ◽

10.1007/978-3-319-55849-3_33 ◽

2017 ◽

pp. 506-521 ◽

Cited By ~ 3

Author(s):

Claudio De Stefano ◽

Francesco Fontanella ◽

Alessandra Scotto di Freca

Keyword(s):

Genetic Algorithm ◽

Feature Selection ◽

High Dimensional Data ◽

High Dimensional

Download Full-text

Ranking Based Unsupervised Feature Selection Methods: An Empirical Comparative Study in High Dimensional Datasets

Advances in Soft Computing - Lecture Notes in Computer Science ◽

10.1007/978-3-030-04491-6_16 ◽

2018 ◽

pp. 205-218

Author(s):

Saúl Solorio-Fernández ◽

J. Ariel Carrasco-Ochoa ◽

José Fco. Martínez-Trinidad

Keyword(s):

Feature Selection ◽

Comparative Study ◽

High Dimensional ◽

Selection Methods ◽

Unsupervised Feature Selection ◽

High Dimensional Datasets

Download Full-text

A hybrid feature selection method based on genetic algorithm and information gain

2016 5th International Conference on Computer Science and Network Technology (ICCSNT) ◽

10.1109/iccsnt.2016.8070172 ◽

2016 ◽

Cited By ~ 1

Author(s):

Fei He ◽

Huamin Yang ◽

Yu Miao ◽

Rainbow Louis

Keyword(s):

Genetic Algorithm ◽

Feature Selection ◽

Information Gain ◽

Feature Selection Method ◽

Selection Method

Download Full-text

Nested and Repeated Cross Validation for Classification Model With High-Dimensional Data

Revista Colombiana de Estadística ◽

10.15446/rce.v43n1.80000 ◽

2020 ◽

Vol 43 (1) ◽

pp. 103-125

Author(s):

Yi Zhong ◽

Jianghua He ◽

Prabhakar Chalise

Keyword(s):

Feature Selection ◽

Cross Validation ◽

Predictive Accuracy ◽

Simulated Data ◽

Classification Model ◽

High Dimensional ◽

Clinical Settings ◽

Feature Subset ◽

Validation Method ◽

High Dimensional Datasets

With the advent of high throughput technologies, the high-dimensional datasets are increasingly available. This has not only opened up new insight into biological systems but also posed analytical challenges. One important problem is the selection of informative feature-subset and prediction of the future outcome. It is crucial that models are not overfitted and give accurate results with new data. In addition, reliable identification of informative features with high predictive power (feature selection) is of interests in clinical settings. We propose a two-step framework for feature selection and classification model construction, which utilizes a nested and repeated cross-validation method. We evaluated our approach using both simulated data and two publicly available gene expression datasets. The proposed method showed comparatively better predictive accuracy for new cases than the standard cross-validation method.

Download Full-text

Feature Selection for High Dimensional Data Using Weighted K-Nearest Neighbors and Genetic Algorithm

IEEE Access ◽

10.1109/access.2020.3012768 ◽

2020 ◽

Vol 8 ◽

pp. 139512-139528

Author(s):

Shuangjie Li ◽

Kaixiang Zhang ◽

Qianru Chen ◽

Shuqin Wang ◽

Shaoqiang Zhang

Keyword(s):

Genetic Algorithm ◽

Feature Selection ◽

High Dimensional Data ◽

Nearest Neighbors ◽

High Dimensional ◽

K Nearest Neighbors ◽

Selection For

Download Full-text

A Novel Algorithm for Clustering and Feature Selection of High Dimensional Datasets

Advances in Modelling and Analysis B ◽

10.18280/ama_b.600301 ◽

2017 ◽

Vol 60 (3) ◽

pp. 525-538

Author(s):

Thulasi Bikku ◽

Alapati Priya

Keyword(s):

Feature Selection ◽

High Dimensional ◽

High Dimensional Datasets ◽

Selection Of ◽

Novel Algorithm

Download Full-text

Pattern and Feature Selection by Genetic Algorithms in Nearest Neighbor Classification

Journal of Advanced Computational Intelligence and Intelligent Informatics ◽

10.20965/jaciii.2000.p0138 ◽

2000 ◽

Vol 4 (2) ◽

pp. 138-145 ◽

Cited By ~ 9

Author(s):

Hisao Ishibuchi ◽

◽

Tomoharu Nakashima

Keyword(s):

Genetic Algorithm ◽

Genetic Algorithms ◽

Feature Selection ◽

Computer Simulations ◽

Nearest Neighbor ◽

Classification Performance ◽

Data Sets ◽

Nearest Neighbor Classification ◽

Reference Set ◽

Neighbor Classification

This paper proposes a genetic-algorithm-based approach for finding a compact reference set in nearest neighbor classification. The reference set is designed by selecting a small number of reference patterns from a large number of training patterns using a genetic algorithm. The genetic algorithm also removes unnecessary features. The reference set in our nearest neighbor classification consists of selected patterns with selected features. A binary string is used for representing the inclusion (or exclusion) of each pattern and feature in the reference set. Our goal is to minimize the number of selected patterns, to minimize the number of selected features, and to maximize the classification performance of the reference set. Computer simulations on commonly used data sets examine the effectiveness of our approach.

Download Full-text

Construct Linear Polynomial Complementary Transformation for NP-Completeness Using Parallel Genetic Algorithm

10.20944/preprints201611.0033.v1 ◽

2016 ◽

Author(s):

Tarik Eltaeib ◽

Julius Dichter

Keyword(s):

Genetic Algorithm ◽

Genetic Algorithms ◽

Parallel Genetic Algorithm ◽

Algorithm Optimization ◽

Parallel Genetic Algorithms ◽

Np Completeness ◽

Linear Polynomial ◽

Optimization Function ◽

Speed Up ◽

Complementary Equation

This paper examines the correlation between numbers of computer cores in parallel genetic algorithms. The objective to determine the linear polynomial complementary equation in order represent the relation between number of parallel processing and optimum solutions. Model this relation as optimization function (f(x)) which able to produce many simulation results. F(x) performance is outperform genetic algorithms. Compression results between genetic algorithm and optimization function is done. Also the optimization function give model to speed up genetic algorithm. Optimization function is a complementary transformation which maps a TSP given to linear without changing the roots of the polynomials.

Download Full-text