Feature Selection in High Dimensional Data by a Filter-Based Genetic Algorithm

Author(s):  
Claudio De Stefano ◽  
Francesco Fontanella ◽  
Alessandra Scotto di Freca
IEEE Access ◽  
2020 ◽  
Vol 8 ◽  
pp. 139512-139528
Author(s):  
Shuangjie Li ◽  
Kaixiang Zhang ◽  
Qianru Chen ◽  
Shuqin Wang ◽  
Shaoqiang Zhang

2012 ◽  
Vol 468-471 ◽  
pp. 1762-1766 ◽  
Author(s):  
Dong Yan ◽  
Shao Wei Liu ◽  
Jian Tang

Feature selection for modeling the high dimensional data, such as the near-infrared spectrum (NIR) is very important. A novel modeling approach combined the adaptive genetic algorithm-kernel partial least squares (AGA-KPLS) is proposed. The KPLS algorithm is used to construct nonlinear models with the popular kernel based modeling technology. The AGA is used to select the optimal feature sub-set from the original high dimensional data, which also used to select the kernel parameters of the KPLS algorithm simultaneously. The experimental results based on the vibration frequency spectrum show that the proposed approach has better prediction performance than the normal GA-PLS method.


2018 ◽  
Vol 7 (2.11) ◽  
pp. 27 ◽  
Author(s):  
Kahkashan Kouser ◽  
Amrita Priyam

One of the open problems of modern data mining is clustering high dimensional data. For this in the paper a new technique called GA-HDClustering is proposed, which works in two steps. First a GA-based feature selection algorithm is designed to determine the optimal feature subset; an optimal feature subset is consisting of important features of the entire data set next, a K-means algorithm is applied using the optimal feature subset to find the clusters. On the other hand, traditional K-means algorithm is applied on the full dimensional feature space.    Finally, the result of GA-HDClustering  is  compared  with  the  traditional  clustering  algorithm.  For comparison different validity  matrices  such  as  Sum  of  squared  error  (SSE),  Within  Group average distance (WGAD), Between group distance (BGD), Davies-Bouldin index(DBI),   are used .The GA-HDClustering uses genetic algorithm for searching an effective feature subspace in a large feature space. This large feature space is made of all dimensions of the data set. The experiment performed on the standard data set revealed that the GA-HDClustering is superior to traditional clustering algorithm. 


Processes ◽  
2021 ◽  
Vol 9 (2) ◽  
pp. 259
Author(s):  
Qilan Ran ◽  
Yedong Song ◽  
Wenli Du ◽  
Wei Du ◽  
Xin Peng

In order to reduce pollutants of the emission from diesel vehicles, complex after-treatment technologies have been proposed, which make the fault detection of diesel engines become increasingly difficult. Thus, this paper proposes a canonical correlation analysis detection method based on fault-relevant variables selected by an elitist genetic algorithm to realize high-dimensional data-driven faults detection of diesel engines. The method proposed establishes a fault detection model by the actual operation data to overcome the limitations of the traditional methods, merely based on benchmark. Moreover, the canonical correlation analysis is used to extract the strong correlation between variables, which constructs the residual vector to realize the fault detection of the diesel engine air and after-treatment system. In particular, the elitist genetic algorithm is used to optimize the fault-relevant variables to reduce detection redundancy, eliminate additional noise interference, and improve the detection rate of the specific fault. The experiments are carried out by implementing the practical state data of a diesel engine, which show the feasibility and efficiency of the proposed approach.


Sign in / Sign up

Export Citation Format

Share Document