scholarly journals On the Number of Close-to-Optimal Feature Sets

2006 ◽  
Vol 2 ◽  
pp. 117693510600200 ◽  
Author(s):  
Edward R. Dougherty ◽  
Marcel Brun

The issue of wide feature-set variability has recently been raised in the context of expression-based classification using microarray data. This paper addresses this concern by demonstrating the natural manner in which many feature sets of a certain size chosen from a large collection of potential features can be so close to being optimal that they are statistically indistinguishable. Feature-set optimality is inherently related to sample size because it only arises on account of the tendency for diminished classifier accuracy as the number of features grows too large for satisfactory design from the sample data. The paper considers optimal feature sets in the framework of a model in which the features are grouped in such a way that intra-group correlation is substantial whereas inter-group correlation is minimal, the intent being to model the situation in which there are groups of highly correlated co-regulated genes and there is little correlation between the co-regulated groups. This is accomplished by using a block model for the covariance matrix that reflects these conditions. Focusing on linear discriminant analysis, we demonstrate how these assumptions can lead to very large numbers of close-to-optimal feature sets.

2019 ◽  
Vol 3 (2) ◽  
pp. 72
Author(s):  
Widi Astuti ◽  
Adiwijaya Adiwijaya

Cancer is one of the leading causes of death globally. Early detection of cancer allows better treatment for patients. One method to detect cancer is using microarray data classification. However, microarray data has high dimensions which complicates the classification process. Linear Discriminant Analysis is a classification technique which is easy to implement and has good accuracy. However, Linear Discriminant Analysis has difficulty in handling high dimensional data. Therefore, Principal Component Analysis, a feature extraction technique is used to optimize Linear Discriminant Analysis performance. Based on the results of the study, it was found that usage of Principal Component Analysis increases the accuracy of up to 29.04% and f-1 score by 64.28% for colon cancer data.


2014 ◽  
Vol 2014 ◽  
pp. 1-9 ◽  
Author(s):  
Guoqi Li ◽  
Changyun Wen ◽  
Wei Wei ◽  
Yi Xu ◽  
Jie Ding ◽  
...  

A generalized linear discriminant analysis based on trace ratio criterion algorithm (GLDA-TRA) is derived to extract features for classification. With the proposed GLDA-TRA, a set of orthogonal features can be extracted in succession. Each newly extracted feature is the optimal feature that maximizes the trace ratio criterion function in the subspace orthogonal to the space spanned by the previous extracted features.


2014 ◽  
Vol 577 ◽  
pp. 1245-1251
Author(s):  
Zhi Ru Chen ◽  
Wen Xue Hong ◽  
Pei Pei Zhao

Imbalance miRNA target sample data bring about the lower prediction accuracy of SVM(Support Vector Machine). This paper proposes an SVM algorithm to predict the target genes based on biased discriminant idea. This paper selects an optimal feature sets as input data, and constructs a kernel optimization objective function based on the biased discriminant analysis criteria in the empirical feature space. The conformal transformation of a kernel is utilized to gradually optimize the kernel matrix. Through the comparative analysis of the experimental results of human, mouse and rat, the imbalance SVM with biased discriminant has higher specificity, sensitivity and prediction accuracy, which proves that it has stronger generalization ability and better robustness.


Sign in / Sign up

Export Citation Format

Share Document