Learning Misclassification Costs for Imbalanced Datasets, Application in Gene Expression Data Classification

Author(s):  
Huijuan Lu ◽  
Yige Xu ◽  
Minchao Ye ◽  
Ke Yan ◽  
Qun Jin ◽  
...  
Author(s):  
WEIXIANG LIU ◽  
KEHONG YUAN ◽  
JIAN WU ◽  
DATIAN YE ◽  
ZHEN JI ◽  
...  

Classification of gene expression samples is a core task in microarray data analysis. How to reduce thousands of genes and to select a suitable classifier are two key issues for gene expression data classification. This paper introduces a framework on combining both feature extraction and classifier simultaneously. Considering the non-negativity, high dimensionality and small sample size, we apply a discriminative mixture model which is designed for non-negative gene express data classification via non-negative matrix factorization (NMF) for dimension reduction. In order to enhance the sparseness of training data for fast learning of the mixture model, a generalized NMF is also adopted. Experimental results on several real gene expression datasets show that the classification accuracy, stability and decision quality can be significantly improved by using the generalized method, and the proposed method can give better performance than some previous reported results on the same datasets.


2008 ◽  
Author(s):  
Bruno F. de Souza ◽  
Andre de Carvalho ◽  
Carlos Soares

Sign in / Sign up

Export Citation Format

Share Document