Classifying Cancer Types Based on Microarray Gene Expressions using Conformal Prediction

2014 ◽

pp. 131-145

Author(s):

Natthakan Iam-On ◽

Tossapon Boongoen

Keyword(s):

Microarray Data ◽

Dimensional Space ◽

Subspace Clustering ◽

Research Direction ◽

Microarray Data Analysis ◽

Consensus Clustering ◽

Gene Expressions ◽

Tumor Subtypes ◽

Microarray Gene ◽

New Research

A need has long been identified for a more effective methodology to understand, prevent, and cure cancer. Microarray technology provides a basis of achieving this goal, with cluster analysis of gene expression data leading to the discrimination of patients, identification of possible tumor subtypes, and individualized treatment. Recently, soft subspace clustering was introduced as an accurate alternative to conventional techniques. This practice has proven effective for high dimensional data, especially for microarray gene expressions. In this review, the basis of weighted dimensional space and different approaches to soft subspace clustering are described. Since most of the models are parameterized, the application of consensus clustering has been identified as a new research direction that is capable of turning the difficulty with parameter selection to an advantage of increasing diversity within an ensemble.

Download Full-text

SPECTRAL CLUSTERING ON GENE EXPRESSION PROFILE TO IDENTIFY CANCER TYPES OR SUBTYPES

Jurnal Teknologi ◽

10.11113/jt.v76.4036 ◽

2015 ◽

Vol 76 (1) ◽

Author(s):

Ang Jun Chin ◽

Andri Mirzal ◽

Habibollah Haron

Keyword(s):

Gene Expression ◽

Gene Expression Profile ◽

Expression Profile ◽

Microarray Data ◽

Spectral Clustering ◽

Data Sets ◽

Clustering Methods ◽

Microarray Gene Expression ◽

Cancer Types ◽

Microarray Gene

Gene expression profile is eminent for its broad applications and achievements in disease discovery and analysis, especially in cancer research. Spectral clustering is robust to irrelevant features which are appropriated for gene expression analysis. However, previous works show that performance comparison with other clustering methods is limited and only a few microarray data sets were analyzed in each study. In this study, we demonstrate the use of spectral clustering in identifying cancer types or subtypes from microarray gene expression profiling. Spectral clustering was applied to eleven microarray data sets and its clustering performances were compared with the results in the literature. Based on the result, overall the spectral clustering slightly outperformed the corresponding results in the literature. The spectral clustering can also offer more stable clustering performances as it has smaller standard deviation value. Moreover, out of eleven data sets the spectral clustering outperformed the corresponding methods in the literature for six data sets. So, it can be stated that the spectral clustering is a promising method in identifying the cancer types or subtypes for microarray gene expression data sets.

Download Full-text

Multiple Suboptimal Solutions for Prediction Rules in Gene Expression Data

Computational and Mathematical Methods in Medicine ◽

10.1155/2013/798189 ◽

2013 ◽

Vol 2013 ◽

pp. 1-14

Author(s):

Osamu Komori ◽

Mari Pritchard ◽

Shinto Eguchi

Keyword(s):

Gene Expression ◽

High Performance ◽

Published Data ◽

Learning Approaches ◽

Statistical Machine Learning ◽

Mutual Coherence ◽

Gene Expressions ◽

Microarray Gene Expression ◽

Analysis Methods ◽

Microarray Gene

This paper discusses mathematical and statistical aspects in analysis methods applied to microarray gene expressions. We focus on pattern recognition to extract informative features embedded in the data for prediction of phenotypes. It has been pointed out that there are severely difficult problems due to the unbalance in the number of observed genes compared with the number of observed subjects. We make a reanalysis of microarray gene expression published data to detect many other gene sets with almost the same performance. We conclude in the current stage that it is not possible to extract only informative genes with high performance in the all observed genes. We investigate the reason why this difficulty still exists even though there are actively proposed analysis methods and learning algorithms in statistical machine learning approaches. We focus on the mutual coherence or the absolute value of the Pearson correlations between two genes and describe the distributions of the correlation for the selected set of genes and the total set. We show that the problem of finding informative genes in high dimensional data is ill-posed and that the difficulty is closely related with the mutual coherence.

Download Full-text