On the Number of Close-to-Optimal Feature Sets

The issue of wide feature-set variability has recently been raised in the context of expression-based classification using microarray data. This paper addresses this concern by demonstrating the natural manner in which many feature sets of a certain size chosen from a large collection of potential features can be so close to being optimal that they are statistically indistinguishable. Feature-set optimality is inherently related to sample size because it only arises on account of the tendency for diminished classifier accuracy as the number of features grows too large for satisfactory design from the sample data. The paper considers optimal feature sets in the framework of a model in which the features are grouped in such a way that intra-group correlation is substantial whereas inter-group correlation is minimal, the intent being to model the situation in which there are groups of highly correlated co-regulated genes and there is little correlation between the co-regulated groups. This is accomplished by using a block model for the covariance matrix that reflects these conditions. Focusing on linear discriminant analysis, we demonstrate how these assumptions can lead to very large numbers of close-to-optimal feature sets.

Download Full-text

Variable selection for Fisher linear discriminant analysis using the modified sequential backward selection algorithm for the microarray data

Applied Mathematics and Computation ◽

10.1016/j.amc.2014.03.141 ◽

2014 ◽

Vol 238 ◽

pp. 132-140 ◽

Cited By ~ 4

Author(s):

Hong-Yi Peng ◽

Chun-Fu Jiang ◽

Xiang Fang ◽

Jin-Shan Liu

Keyword(s):

Discriminant Analysis ◽

Variable Selection ◽

Linear Discriminant Analysis ◽

Microarray Data ◽

Selection Algorithm ◽

Fisher Linear Discriminant ◽

Linear Discriminant ◽

Fisher Linear Discriminant Analysis ◽

Selection For

Download Full-text

Optimal feature selection for sparse linear discriminant analysis and its applications in gene expression data

Computational Statistics & Data Analysis ◽

10.1016/j.csda.2013.04.003 ◽

2013 ◽

Vol 66 ◽

pp. 140-149 ◽

Cited By ~ 10

Author(s):

Cheng Wang ◽

Longbing Cao ◽

Baiqi Miao

Keyword(s):

Gene Expression ◽

Feature Selection ◽

Discriminant Analysis ◽

Linear Discriminant Analysis ◽

Gene Expression Data ◽

Expression Data ◽

Linear Discriminant ◽

Optimal Feature Selection ◽

Selection For ◽

Optimal Feature

Download Full-text

Principal Component Analysis Sebagai Ekstraksi Fitur Data Microarray Untuk Deteksi Kanker Berbasis Linear Discriminant Analysis

JURNAL MEDIA INFORMATIKA BUDIDARMA ◽

10.30865/mib.v3i2.1161 ◽

2019 ◽

Vol 3 (2) ◽

pp. 72

Author(s):

Widi Astuti ◽

Adiwijaya Adiwijaya

Keyword(s):

Principal Component Analysis ◽

Discriminant Analysis ◽

Linear Discriminant Analysis ◽

Microarray Data ◽

Principal Component ◽

Component Analysis ◽

High Dimensions ◽

Linear Discriminant ◽

Cancer Data ◽

Colon Cancer Data

Cancer is one of the leading causes of death globally. Early detection of cancer allows better treatment for patients. One method to detect cancer is using microarray data classification. However, microarray data has high dimensions which complicates the classification process. Linear Discriminant Analysis is a classification technique which is easy to implement and has good accuracy. However, Linear Discriminant Analysis has difficulty in handling high dimensional data. Therefore, Principal Component Analysis, a feature extraction technique is used to optimize Linear Discriminant Analysis performance. Based on the results of the study, it was found that usage of Principal Component Analysis increases the accuracy of up to 29.04% and f-1 score by 64.28% for colon cancer data.

Download Full-text

Modified linear discriminant analysis approaches for classification of high-dimensional microarray data

Computational Statistics & Data Analysis ◽

10.1016/j.csda.2008.02.005 ◽

2009 ◽

Vol 53 (5) ◽

pp. 1674-1687 ◽

Cited By ~ 57

Author(s):

Ping Xu ◽

Guy N. Brock ◽

Rudolph S. Parrish

Keyword(s):

Discriminant Analysis ◽

Linear Discriminant Analysis ◽

Microarray Data ◽

High Dimensional ◽

Linear Discriminant

Download Full-text

Trace Ratio Criterion for Feature Extraction in Classification

Mathematical Problems in Engineering ◽

10.1155/2014/725204 ◽

2014 ◽

Vol 2014 ◽

pp. 1-9 ◽

Cited By ~ 3

Author(s):

Guoqi Li ◽

Changyun Wen ◽

Wei Wei ◽

Yi Xu ◽

Jie Ding ◽

...

Keyword(s):

Feature Extraction ◽

Discriminant Analysis ◽

Linear Discriminant Analysis ◽

Criterion Function ◽

Linear Discriminant ◽

Ratio Criterion ◽

Trace Ratio ◽

Optimal Feature

A generalized linear discriminant analysis based on trace ratio criterion algorithm (GLDA-TRA) is derived to extract features for classification. With the proposed GLDA-TRA, a set of orthogonal features can be extracted in succession. Each newly extracted feature is the optimal feature that maximizes the trace ratio criterion function in the subspace orthogonal to the space spanned by the previous extracted features.

Download Full-text

An Imbalance SVM for MicroRNA Target Genes Prediction

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.577.1245 ◽

2014 ◽

Vol 577 ◽

pp. 1245-1251

Author(s):

Zhi Ru Chen ◽

Wen Xue Hong ◽

Pei Pei Zhao

Keyword(s):

Prediction Accuracy ◽

Target Genes ◽

Feature Space ◽

Support Vector ◽

Microrna Target ◽

Feature Sets ◽

Svm Algorithm ◽

Sample Data ◽

Kernel Optimization ◽

Optimal Feature

Imbalance miRNA target sample data bring about the lower prediction accuracy of SVM(Support Vector Machine). This paper proposes an SVM algorithm to predict the target genes based on biased discriminant idea. This paper selects an optimal feature sets as input data, and constructs a kernel optimization objective function based on the biased discriminant analysis criteria in the empirical feature space. The conformal transformation of a kernel is utilized to gradually optimize the kernel matrix. Through the comparative analysis of the experimental results of human, mouse and rat, the imbalance SVM with biased discriminant has higher specificity, sensitivity and prediction accuracy, which proves that it has stronger generalization ability and better robustness.

Download Full-text