How to Optimally Combine Univariate and Multivariate Feature Selection with Data Sampling for Classifying Noisy, High Dimensional and Class Imbalanced DNA Microarray Data#

2020 ◽  
pp. 33-61
Author(s):  
Ahmad Abu Shanab ◽  
Taghi M Khoshgoftaar
Author(s):  
Veronica Bolon-Canedo ◽  
Konstantinos Sechidis ◽  
Noelia Sanchez-Marono ◽  
Amparo Alonso-Betanzos ◽  
Gavin Brown

The Gene Expression data contains important information about the biological reactions that are carried out in creatures that are based on the surroundings. For understanding those processes in a better manner, the hidden expressions from the high-dimensional non-linear data are to be processed effectively. Moreover, the intricacy of biological data increases the challenges in High-dimensional non-linear gene data processing. For handling the complications, incorporation of clustering techniques is employed for identifying the patterns appropriately. Furthermore, this makes clarity in gene functions, regulations, inherent expressions, categories from noisy data input. In this paper, an Integrated Model for Optimization and Clustering (IMOC) is developed for efficient gene data processing for cancer detection application. Two efficient algorithms are integrated in this work, Ant Colony Optimization for effectively determine the features of gene data and Density based Clustering (DENCLUE) is for clustering the DNA data based on the determined features. For evaluating the proposed model, the benchmark datasets such as DNA Microarray Data of Leukemia and DNA Microarray Data of Colon Cancer are used. Further, the results show that the proposed model outperforms the existing models in accuracy and efficiency rates.


Sign in / Sign up

Export Citation Format

Share Document