A Simulated Annealing and Resampling Method for Training Perceptrons to Classify Gene-Expression Data

Author(s):  
Andreas A. Albrecht ◽  
Staal A. Vinterbo ◽  
C. K. Wong ◽  
Lucila Ohno-Machado

DNA microarray technology produces gene expression matrix that consists of an inexorably missing entries due to poor experimental procedures. The missing values are predicted in the matrix for gene expression data are considered to be essential, since most algorithms analyse the gene expression that usually needs a matrix without missing values. In order to address this issue, the present study biclustering Genetic based Simulated Annealing (Genetic SA) algorithm to predict the items that are missing in the gene expression data. The present study uses biclustering method that is considered to be essential for clustering the gene expression data. The performance evaluation shows that the proposed Genetic SA for gene data expression predicts the missing items in an accurate manner than the existing methods.


Microarray technology has been developed as one of the powerful tools that have attracted many researchers to analyze gene expression level for a given organism. It has been observed that gene expression data have very large (in terms of thousands) of features and less number of samples (in terms of hundreds). This characteristic makes difficult to do an analysis of gene expression data. Hence efficient feature selection technique must be applied before we go for any kind of analysis. Feature selection plays a vital role in the classification of gene expression data. There are several feature selection techniques have been induced in this field. But Support Vector Machine with Recursive Feature Elimination (SVM-RFE) has been proven as the promising feature selection methods among others. SVM-RFE ranks the genes (features) by training the SVM classification model and with the combination of RFE method key genes are selected. Huge time consumption is the main issue of SVM-RFE. We introduced an efficient implementation of linier SVM to overcome this problem and improved the RFE with variable step size. Then, combined method was used for selecting informative genes. Effective resampling method is proposed to preprocess the datasets. This is used to make the distribution of samples balanced, which gives more reliable classification results. In this paper, we have also studied the applicability of common classifiers. Detailed experiments are conducted on four commonly used microarray gene expression datasets. The results show that the proposed method comparable classification performance


Sign in / Sign up

Export Citation Format

Share Document