Integrating Colon Cancer Microarray Data:  Associating Locus-Specific Methylation Groups to Gene Expression-Based Classifications

Gene-expression microarray datasets often consist of a limited number of samples with a large number of gene-expression measurements, usually on the order of thousands. Therefore, dimensionality reduction is critical prior to any classification task. In this work, the iterative feature perturbation method (IFP), an embedded gene selector, is introduced and applied to four microarray cancer datasets: colon cancer, leukemia, Moffitt colon cancer, and lung cancer. We compare results obtained by IFP to those of support vector machine-recursive feature elimination (SVM-RFE) and the t-test as a feature filter using a linear support vector machine as the base classifier. Analysis of the intersection of gene sets selected by the three methods across the four datasets was done. Additional experiments included an initial pre-selection of the top 200 genes based on their p values. IFP and SVM-RFE were then applied on the reduced feature sets. These results showed up to 3.32% average performance improvement for IFP across the four datasets. A statistical analysis (using the Friedman/Holm test) for both scenarios showed the highest accuracies came from the t-test as a filter on experiments without gene pre-selection. IFP and SVM-RFE had greater classification accuracy after gene pre-selection. Analysis showed the t-test is a good gene selector for microarray data. IFP and SVM-RFE showed performance improvement on a reduced by t-test dataset. The IFP approach resulted in comparable or superior average class accuracy when compared to SVM-RFE on three of the four datasets. The same or similar accuracies can be obtained with different sets of genes.

Download Full-text

The use of logic relationships to model colon cancer gene expression networks with mRNA microarray data

Journal of Biomedical Informatics ◽

10.1016/j.jbi.2007.11.006 ◽

2008 ◽

Vol 41 (4) ◽

pp. 530-543 ◽

Cited By ~ 8

Author(s):

Xiaogang Ruan ◽

Jinlian Wang ◽

Hui Li ◽

Rhoda E. Perozzi ◽

Edmund F. Perozzi

Keyword(s):

Gene Expression ◽

Colon Cancer ◽

Microarray Data ◽

Cancer Gene ◽

Mrna Microarray

Download Full-text

Mining Gene Expression Profile with Missing Values: An Integration of Kernel PCA and Robust Singular Values Decomposition

Current Bioinformatics ◽

10.2174/1574893613666180413151654 ◽

2018 ◽

Vol 14 (1) ◽

pp. 78-89 ◽

Cited By ~ 2

Author(s):

Md. Saimul Islam ◽

Md. Aminul Hoque ◽

Md. Sahidul Islam ◽

Mohammad Ali ◽

Md. Bipul Hossen ◽

...

Keyword(s):

Gene Expression ◽

Colon Cancer ◽

Gene Expression Profiling ◽

Microarray Data ◽

Expression Profiling ◽

Missing Values ◽

Cellular Systems ◽

Differentially Expressed ◽

Data Matrix ◽

Imputation Methods

Background: Gene expression profiling and transcriptomics provide valuable information about the role of genes that are differentially expressed between two or more samples. It is always important and challenging to analyse High-throughput DNA microarray data with a number of missing values under various experimental conditions. </P><P> Objectives: Graphical data visualizations of the expression of all genes in a particular cell provide holistic views of gene expression patterns, which improve our understanding of cellular systems under normal and pathological conditions. However, current visualization methods are sensitive to missing values, which are frequently observed in microarray-based gene expression profiling, potentially affecting the subsequent statistical analyses. Methods: We addressed in this study the problem of missing values with respect to different imputation methods using gene expression biplot (GE biplot), one of the most popular gene visualization techniques. The effects of missing values for mining differentially expressed genes in gene expression data were evaluated using four well-known imputation methods: Robust Singular Value Decomposition (Robust SVD), Column Average (CA), Column Median (CM), and K-nearest Neighbors (KNN). Frobenius norm and absolute distances were used to measure the accuracy of the methods. Results: Three numerical experiments were performed using simulated data (i) and publicly available colon cancer (ii) and leukemia data (iii) to analyze the performance of each method. The results showed that CM and KNN performed better than Robust SVD and CA for identifying the index gene profile in the biplot visualization in both the simulation study and the colon cancer and leukemia microarray datasets. Conclusion: The impact of missing values on the GE biplot was smaller when the data matrix was imputed by KNN than by CM. This study concluded that KNN performed satisfactorily in generating a GE biplot in the presence of missing values in microarray data.

Download Full-text

Identifying molecular subtypes in human colon cancer using gene expression and DNA methylation microarray data

International Journal of Oncology ◽

10.3892/ijo.2015.3263 ◽

2015 ◽

Vol 48 (2) ◽

pp. 690-702 ◽

Cited By ~ 13

Author(s):

ZHONGLU REN ◽

WENHUI WANG ◽

JINMING LI

Keyword(s):

Gene Expression ◽

Dna Methylation ◽

Colon Cancer ◽

Microarray Data ◽

Molecular Subtypes ◽

Human Colon ◽

Human Colon Cancer

Download Full-text

Gene Expression Profiling of Nonneoplastic Mucosa May Predict Clinical Outcome of Colon Cancer Patients

Yearbook of Surgery ◽

10.1016/s0090-3671(08)70228-3 ◽

2007 ◽

Vol 2007 ◽

pp. 306-307

Author(s):

T.J. Eberlein

Keyword(s):

Gene Expression ◽

Colon Cancer ◽

Clinical Outcome ◽

Gene Expression Profiling ◽

Cancer Patients ◽

Expression Profiling

Download Full-text

Classical and Bayesian mixed model analysis of microarray data for detecting gene expression and DNA differences

10.31274/etd-180810-559 ◽

2009 ◽

Author(s):

Cumhur Yusuf Demirkale

Keyword(s):

Gene Expression ◽

Microarray Data ◽

Mixed Model ◽

Model Analysis ◽

Mixed Model Analysis

Download Full-text

Differential control of growth, apoptotic activity and gene expression in human colon cancer cells by extracts derived from medicinal herbs,Rhazya strictaandZingiber officinaleand their combination

World Journal of Gastroenterology ◽

10.3748/wjg.v20.i41.15275 ◽

2014 ◽

Vol 20 (41) ◽

pp. 15275 ◽

Cited By ~ 14

Author(s):

Ayman I Elkady

Keyword(s):

Gene Expression ◽

Colon Cancer ◽

Cancer Cells ◽

Human Colon ◽

Medicinal Herbs ◽

Colon Cancer Cells ◽

Human Colon Cancer ◽

Apoptotic Activity ◽

Differential Control ◽

Human Colon Cancer Cells

Download Full-text

The Analysis of Gene Expression Data, Statistical Analysis of Gene Expression Microarray Data

Technometrics ◽

10.1198/tech.2003.s188 ◽

2003 ◽

Vol 45 (4) ◽

pp. 375-375

Keyword(s):

Gene Expression ◽

Statistical Analysis ◽

Gene Expression Data ◽

Microarray Data ◽

Expression Data ◽

Gene Expression Microarray ◽

Expression Microarray ◽

Gene Expression Microarray Data

Download Full-text

The fundamental role of pattern recognition for gene-expression/microarray data in bioinformatics

Pattern Recognition ◽

10.1016/j.patcog.2005.03.008 ◽

2005 ◽

Vol 38 (12) ◽

pp. 2226-2228 ◽

Cited By ~ 11

Author(s):

Edward R. Dougherty

Keyword(s):

Gene Expression ◽

Pattern Recognition ◽

Microarray Data ◽

Gene Expression Microarray ◽

Expression Microarray ◽

Gene Expression Microarray Data

Download Full-text

Computational Strategies for Analyzing Data in Gene Expression Microarray Experiments

Journal of Bioinformatics and Computational Biology ◽

10.1142/s0219720003000319 ◽

2003 ◽

Vol 01 (03) ◽

pp. 541-586 ◽

Cited By ~ 33

Author(s):

Tero Aittokallio ◽

Markus Kurki ◽

Olli Nevalainen ◽

Tuomas Nikula ◽

Anne West ◽

...

Keyword(s):

Gene Expression ◽

Data Analysis ◽

Microarray Data ◽

Microarray Data Analysis ◽

Biological Research ◽

Microarray Experiments ◽

Dna Microarray Data ◽

Open Questions ◽

Analysis Technique ◽

Wide Range

Microarray analysis has become a widely used method for generating gene expression data on a genomic scale. Microarrays have been enthusiastically applied in many fields of biological research, even though several open questions remain about the analysis of such data. A wide range of approaches are available for computational analysis, but no general consensus exists as to standard for microarray data analysis protocol. Consequently, the choice of data analysis technique is a crucial element depending both on the data and on the goals of the experiment. Therefore, basic understanding of bioinformatics is required for optimal experimental design and meaningful interpretation of the results. This review summarizes some of the common themes in DNA microarray data analysis, including data normalization and detection of differential expression. Algorithms are demonstrated by analyzing cDNA microarray data from an experiment monitoring gene expression in T helper cells. Several computational biology strategies, along with their relative merits, are overviewed and potential areas for additional research discussed. The goal of the review is to provide a computational framework for applying and evaluating such bioinformatics strategies. Solid knowledge of microarray informatics contributes to the implementation of more efficient computational protocols for the given data obtained through microarray experiments.

Download Full-text