SEMIPARAMETRIC CLUSTERING METHOD FOR MICROARRAY DATA ANALYSIS

Clustering is a major tool for microarray gene expression data analysis. The existing clustering methods fall mainly into two categories: parametric and nonparametric. The parametric methods generally assume a mixture of parametric subdistributions. When the mixture distribution approximately fits the true data generating mechanism, the parametric methods perform well, but not so when there is nonnegligible deviation between them. On the other hand, the nonparametric methods, which usually do not make distributional assumptions, are robust but pay the price for efficiency loss. In an attempt to utilize the known mixture form to increase efficiency, and to free assumptions about the unknown subdistributions to enhance robustness, we propose a semiparametric method for clustering. The proposed approach possesses the form of parametric mixture, with no assumptions to the subdistributions. The subdistributions are estimated nonparametrically, with constraints just being imposed on the modes. An expectation-maximization (EM) algorithm along with a classification step is invoked to cluster the data, and a modified Bayesian information criterion (BIC) is employed to guide the determination of the optimal number of clusters. Simulation studies are conducted to assess the performance and the robustness of the proposed method. The results show that the proposed method yields reasonable partition of the data. As an illustration, the proposed method is applied to a real microarray data set to cluster genes.

Download Full-text

Algorithmic and Complexity Issues of Three Clustering Methods in Microarray Data Analysis

Lecture Notes in Computer Science - Computing and Combinatorics ◽

10.1007/11533719_10 ◽

2005 ◽

pp. 74-83

Author(s):

Jinsong Tan ◽

Kok Seng Chua ◽

Louxin Zhang

Keyword(s):

Data Analysis ◽

Microarray Data ◽

Microarray Data Analysis ◽

Clustering Methods

Download Full-text

Algorithmic and Complexity Issues of Three Clustering Methods in Microarray Data Analysis

Algorithmica ◽

10.1007/s00453-007-0040-4 ◽

2007 ◽

Vol 48 (2) ◽

pp. 203-219 ◽

Cited By ~ 5

Author(s):

Jinsong Tan ◽

Kok Seng Chua ◽

Louxin Zhang ◽

Song Zhu

Keyword(s):

Data Analysis ◽

Microarray Data ◽

Microarray Data Analysis ◽

Clustering Methods

Download Full-text

Enhancing Interdisciplinary Mathematics and Biology Education: A Microarray Data Analysis Course Bridging These Disciplines

CBE—Life Sciences Education ◽

10.1187/cbe.09-09-0067 ◽

2010 ◽

Vol 9 (3) ◽

pp. 217-226 ◽

Cited By ~ 11

Author(s):

Yolande V. Tra ◽

Irene M. Evans

Keyword(s):

Data Analysis ◽

Microarray Data ◽

Biology Education ◽

Microarray Experiment ◽

Educational Background ◽

Microarray Data Analysis ◽

Data Set ◽

Set Up ◽

Interdisciplinary Course ◽

The Impact

BIO2010 put forth the goal of improving the mathematical educational background of biology students. The analysis and interpretation of microarray high-dimensional data can be very challenging and is best done by a statistician and a biologist working and teaching in a collaborative manner. We set up such a collaboration and designed a course on microarray data analysis. We started using Genome Consortium for Active Teaching (GCAT) materials and Microarray Genome and Clustering Tool software and added R statistical software along with Bioconductor packages. In response to student feedback, one microarray data set was fully analyzed in class, starting from preprocessing to gene discovery to pathway analysis using the latter software. A class project was to conduct a similar analysis where students analyzed their own data or data from a published journal paper. This exercise showed the impact that filtering, preprocessing, and different normalization methods had on gene inclusion in the final data set. We conclude that this course achieved its goals to equip students with skills to analyze data from a microarray experiment. We offer our insight about collaborative teaching as well as how other faculty might design and implement a similar interdisciplinary course.

Download Full-text

Impact of DNA microarray data transformation on gene expression analysis - comparison of two normalization methods.

Acta Biochimica Polonica ◽

10.18388/abp.2011_2227 ◽

2011 ◽

Vol 58 (4) ◽

Cited By ~ 8

Author(s):

Marcin T Schmidt ◽

Luiza Handschuh ◽

Joanna Zyprych ◽

Alicja Szabelska ◽

Agnieszka K Olejnik-Schmidt ◽

...

Keyword(s):

Gene Expression ◽

Data Analysis ◽

Dna Microarray ◽

Microarray Data ◽

Normalization Method ◽

Differentially Expressed ◽

Microarray Data Analysis ◽

Data Set ◽

Normalization Methods ◽

The Impact

Two-color DNA microarrays are commonly used for the analysis of global gene expression. They provide information on relative abundance of thousands of mRNAs. However, the generated data need to be normalized to minimize systematic variations so that biologically significant differences can be more easily identified. A large number of normalization procedures have been proposed and many softwares for microarray data analysis are available. Here, we have applied two normalization methods (median and loess) from two packages of microarray data analysis softwares. They were examined using a sample data set. We found that the number of genes identified as differentially expressed varied significantly depending on the method applied. The obtained results, i.e. lists of differentially expressed genes, were consistent only when we used median normalization methods. Loess normalization implemented in the two software packages provided less coherent and for some probes even contradictory results. In general, our results provide an additional piece of evidence that the normalization method can profoundly influence final results of DNA microarray-based analysis. The impact of the normalization method depends greatly on the algorithm employed. Consequently, the normalization procedure must be carefully considered and optimized for each individual data set.

Download Full-text

Descriptive and Systematic Comparison of Clustering Methods in Microarray Data Analysis

Korean Journal of Applied Statistics ◽

10.5351/kjas.2009.22.1.089 ◽

2009 ◽

Vol 22 (1) ◽

pp. 89-106

Author(s):

Seo-Young Kim

Keyword(s):

Data Analysis ◽

Microarray Data ◽

Microarray Data Analysis ◽

Clustering Methods ◽

Systematic Comparison

Download Full-text

Faculty Opinions recommendation of Resampling-based multiple testing for microarray data analysis.

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.726695013.793522568 ◽

2016 ◽

Author(s):

Tian Zheng

Keyword(s):

Data Analysis ◽

Microarray Data ◽

Multiple Testing ◽

Microarray Data Analysis

Download Full-text

Methods of Microarray Data Analysis III

Journal of Microbiological Methods ◽

10.1016/j.mimet.2004.02.001 ◽

2004 ◽

Vol 57 (2) ◽

pp. 293

Author(s):

Mareike Viebahn

Keyword(s):

Data Analysis ◽

Microarray Data ◽

Microarray Data Analysis

Download Full-text

Multiclass Decision Forest—A Novel Pattern Recognition Method for Multiclass Classification in Microarray Data Analysis

DNA and Cell Biology ◽

10.1089/dna.2004.23.685 ◽

2004 ◽

Vol 23 (10) ◽

pp. 685-694 ◽

Cited By ~ 30

Author(s):

Huixiao Hong ◽

Weida Tong ◽

Roger Perkins ◽

Hong Fang ◽

Qian Xie ◽

...

Keyword(s):

Pattern Recognition ◽

Data Analysis ◽

Microarray Data ◽

Multiclass Classification ◽

Microarray Data Analysis ◽

Pattern Recognition Method ◽

Recognition Method ◽

Decision Forest

Download Full-text

Computational Strategies for Analyzing Data in Gene Expression Microarray Experiments

Journal of Bioinformatics and Computational Biology ◽

10.1142/s0219720003000319 ◽

2003 ◽

Vol 01 (03) ◽

pp. 541-586 ◽

Cited By ~ 33

Author(s):

Tero Aittokallio ◽

Markus Kurki ◽

Olli Nevalainen ◽

Tuomas Nikula ◽

Anne West ◽

...

Keyword(s):

Gene Expression ◽

Data Analysis ◽

Microarray Data ◽

Microarray Data Analysis ◽

Biological Research ◽

Microarray Experiments ◽

Dna Microarray Data ◽

Open Questions ◽

Analysis Technique ◽

Wide Range

Microarray analysis has become a widely used method for generating gene expression data on a genomic scale. Microarrays have been enthusiastically applied in many fields of biological research, even though several open questions remain about the analysis of such data. A wide range of approaches are available for computational analysis, but no general consensus exists as to standard for microarray data analysis protocol. Consequently, the choice of data analysis technique is a crucial element depending both on the data and on the goals of the experiment. Therefore, basic understanding of bioinformatics is required for optimal experimental design and meaningful interpretation of the results. This review summarizes some of the common themes in DNA microarray data analysis, including data normalization and detection of differential expression. Algorithms are demonstrated by analyzing cDNA microarray data from an experiment monitoring gene expression in T helper cells. Several computational biology strategies, along with their relative merits, are overviewed and potential areas for additional research discussed. The goal of the review is to provide a computational framework for applying and evaluating such bioinformatics strategies. Solid knowledge of microarray informatics contributes to the implementation of more efficient computational protocols for the given data obtained through microarray experiments.

Download Full-text