scholarly journals Classication Models Based-on Incremental Learning Algorithm and Feature Selection on Gene Expression Data

Author(s):  
Phayung Meesad ◽  
Sageemas Na Wichian ◽  
Unger Herwig ◽  
Patharawut Saengsiri

This Gene expression data illustrates levels of genes that DNA encodes into the protein such as muscle or brain cells. However, some abnormal cells may evolve from unnatural expression levels. Therefore, finding a subset of informative gene would be beneficial to biologists because it can identify discriminative genes. Unfortunately, genes grow up rapidlyinto the tens of thousands gene which make it difficult for classifying processes such as curse of dimensionality and misclassification problems. This paper proposed classification model based-on incremental learning algorithm and feature selection on gene expression data. Three feature selection methods: Correlation based Feature Selection (Cfs), Gain Ratio (GR), and Information Gain (Info) combined with Incremental Learning Algorithm based-on Mahalanobis Distance (ILM). Result of the experiment represented proposed models CfsILM, GRILM and InfoILM not only to reduce many dimensions from 2001, 7130 and 4026 into 26, 135, and 135 that save time-resource but also to improve accuracy rate 64.52%, 34.29%, and 8.33% into 90%, 97.14%, and 83.33% respectively. Particularly, CfsILM is more outstanding than other models on three public gene expression datasets.

Author(s):  
Mekour Norreddine

One of the problems that gene expression data resolved is feature selection. There is an important process for choosing which features are important for prediction; there are two general approaches for feature selection: filter approach and wrapper approach. In this chapter, the authors combine the filter approach with method ranked information gain and wrapper approach with a searching method of the genetic algorithm. The authors evaluate their approach on two data sets of gene expression data: Leukemia, and the Central Nervous System. The classifier Decision tree (C4.5) is used for improving the classification performance.


Microarray technology has been developed as one of the powerful tools that have attracted many researchers to analyze gene expression level for a given organism. It has been observed that gene expression data have very large (in terms of thousands) of features and less number of samples (in terms of hundreds). This characteristic makes difficult to do an analysis of gene expression data. Hence efficient feature selection technique must be applied before we go for any kind of analysis. Feature selection plays a vital role in the classification of gene expression data. There are several feature selection techniques have been induced in this field. But Support Vector Machine with Recursive Feature Elimination (SVM-RFE) has been proven as the promising feature selection methods among others. SVM-RFE ranks the genes (features) by training the SVM classification model and with the combination of RFE method key genes are selected. Huge time consumption is the main issue of SVM-RFE. We introduced an efficient implementation of linier SVM to overcome this problem and improved the RFE with variable step size. Then, combined method was used for selecting informative genes. Effective resampling method is proposed to preprocess the datasets. This is used to make the distribution of samples balanced, which gives more reliable classification results. In this paper, we have also studied the applicability of common classifiers. Detailed experiments are conducted on four commonly used microarray gene expression datasets. The results show that the proposed method comparable classification performance


2019 ◽  
Vol 21 (9) ◽  
pp. 631-645 ◽  
Author(s):  
Saeed Ahmed ◽  
Muhammad Kabir ◽  
Zakir Ali ◽  
Muhammad Arif ◽  
Farman Ali ◽  
...  

Aim and Objective: Cancer is a dangerous disease worldwide, caused by somatic mutations in the genome. Diagnosis of this deadly disease at an early stage is exceptionally new clinical application of microarray data. In DNA microarray technology, gene expression data have a high dimension with small sample size. Therefore, the development of efficient and robust feature selection methods is indispensable that identify a small set of genes to achieve better classification performance. Materials and Methods: In this study, we developed a hybrid feature selection method that integrates correlation-based feature selection (CFS) and Multi-Objective Evolutionary Algorithm (MOEA) approaches which select the highly informative genes. The hybrid model with Redial base function neural network (RBFNN) classifier has been evaluated on 11 benchmark gene expression datasets by employing a 10-fold cross-validation test. Results: The experimental results are compared with seven conventional-based feature selection and other methods in the literature, which shows that our approach owned the obvious merits in the aspect of classification accuracy ratio and some genes selected by extensive comparing with other methods. Conclusion: Our proposed CFS-MOEA algorithm attained up to 100% classification accuracy for six out of eleven datasets with a minimal sized predictive gene subset.


Sign in / Sign up

Export Citation Format

Share Document