Comparison of supervised models in hepatocellular carcinoma tumor classification based on expression data using principal component analysis (PCA)

One important feature of the gene expression data is that the number of genesMfar exceeds the number of samplesN. Standard statistical methods do not work well whenN<M. Development of new methodologies or modification of existing methodologies is needed for the analysis of the microarray data. In this paper, we propose a novel analysis procedure for classifying the gene expression data. This procedure involves dimension reduction using kernel principal component analysis (KPCA) and classification with logistic regression (discrimination). KPCA is a generalization and nonlinear version of principal component analysis. The proposed algorithm was applied to five different gene expression datasets involving human tumor samples. Comparison with other popular classification methods such as support vector machines and neural networks shows that our algorithm is very promising in classifying gene expression data.

Download Full-text

Principal component analysis of gene expression data: The case of atrial fibrillation

2010 3rd International Symposium on Applied Sciences in Biomedical and Communication Technologies (ISABEL 2010) ◽

10.1109/isabel.2010.5702916 ◽

2010 ◽

Author(s):

Federica Censi ◽

Giovanni Calcagnini ◽

Pietro Bartolini ◽

Alessandro Giuliani

Keyword(s):

Gene Expression ◽

Atrial Fibrillation ◽

Principal Component Analysis ◽

Gene Expression Data ◽

Principal Component ◽

Component Analysis ◽

Expression Data

Download Full-text

GO-PCA: An Unsupervised Method to Explore Biological Heterogeneity Based on Gene Expression and Prior Knowledge

10.1101/018705 ◽

2015 ◽

Author(s):

Florian Wagner

Keyword(s):

Principal Component Analysis ◽

Prior Knowledge ◽

Large Scale ◽

Cpg Island ◽

Expression Profiles ◽

Principal Component ◽

Component Analysis ◽

Expression Data ◽

Unsupervised Method ◽

Biological Heterogeneity

Genome-wide expression profiling is a cost-efficient and widely used method to characterize heterogeneous populations of cells, tissues, biopsies, or other biological specimen. The exploratory analysis of such datasets typically relies on generic unsupervised methods, e.g. principal component analysis or hierarchical clustering. However, generic methods fail to exploit the significant amount of knowledge that exists about the molecular functions of genes. Here, I introduce GO-PCA, an unsupervised method that incorporates prior knowledge about gene functions in the form of gene ontology (GO) annotations. GO-PCA aims to discover and represent biological heterogeneity along all major axes of variation in a given dataset, while suppressing heterogeneity due to technical biases. To this end, GO-PCA combines principal component analysis (PCA) with nonparametric GO enrichment analysis, and uses the results to generate expression signatures based on small sets of functionally related genes. I first applied GO-PCA to expression data from diverse lineages of the human hematopoietic system, and obtained a small set of signatures that captured known cell characteristics for most lineages. I then applied the method to expression profiles of glioblastoma (GBM) tumor biopsies, and obtained signatures that were strongly associated with multiple previously described GBM subtypes. Surprisingly, GO-PCA discovered a cell cycle-related signature that exhibited significant differences between the Proneural and the prognostically favorable GBM CpG Island Methylator (G-CIMP) subtypes, suggesting that the G-CIMP subtype is characterized in part by lower mitotic activity. Previous expression-based classifications have failed to separate these subtypes, demonstrating that GO-PCA can detect heterogeneity that is missed by other methods. My results show that GO-PCA is a powerful and versatile expression-based method that facilitates exploration of large-scale expression data, without requiring additional types of experimental data. The low-dimensional representation generated by GO-PCA lends itself to interpretation, hypothesis generation, and further analysis.

Download Full-text