Statistical Design and Data Analysis for Microarray Experiments

2003 ◽  
Vol 01 (03) ◽  
pp. 541-586 ◽  
Author(s):  
Tero Aittokallio ◽  
Markus Kurki ◽  
Olli Nevalainen ◽  
Tuomas Nikula ◽  
Anne West ◽  
...  

Microarray analysis has become a widely used method for generating gene expression data on a genomic scale. Microarrays have been enthusiastically applied in many fields of biological research, even though several open questions remain about the analysis of such data. A wide range of approaches are available for computational analysis, but no general consensus exists as to standard for microarray data analysis protocol. Consequently, the choice of data analysis technique is a crucial element depending both on the data and on the goals of the experiment. Therefore, basic understanding of bioinformatics is required for optimal experimental design and meaningful interpretation of the results. This review summarizes some of the common themes in DNA microarray data analysis, including data normalization and detection of differential expression. Algorithms are demonstrated by analyzing cDNA microarray data from an experiment monitoring gene expression in T helper cells. Several computational biology strategies, along with their relative merits, are overviewed and potential areas for additional research discussed. The goal of the review is to provide a computational framework for applying and evaluating such bioinformatics strategies. Solid knowledge of microarray informatics contributes to the implementation of more efficient computational protocols for the given data obtained through microarray experiments.


1991 ◽  
Vol 34 (4) ◽  
pp. 493-526 ◽  
Author(s):  
Thomas Ferguson

This paper is a response to Webber's (1991) critique of Thomas Ferguson's (1983, 1984, 1986) essays on the New Deal and his “investment theory” of political parties. It argues that Webber's evidence is invalid and that his statistical design is conceptually flawed. The sample is defective: it includes many people it should not and it excludes others who should have been reckoned in, notably many Texas oilmen. His procedure for ascertaining corporate partisanship is inadequate, since, among other problems, it excludes large payments made to the 1936 Democratic campaign by firms such as Standard Oil of New Jersey and General Electric. The campaign finance data he relies upon are also far less complete than he implies. An entirely new data analysis is presented, incorporating not only Webber's data, but much new material from archives. The results confirm Ferguson's central thesis about the 1936 election: contributions to the Democrats in 1936 do indeed come from firms that are more internationally-oriented and capital-intensive than those contributing to the Republicans.


2002 ◽  
Vol 77 (9) ◽  
pp. 927-940 ◽  
Author(s):  
Ayalew Tefferi ◽  
Mark E. Bolander ◽  
Stephen M. Ansell ◽  
Eric D. Wieben ◽  
Thomas C. Spelsberg

2010 ◽  
Vol 56 (3) ◽  
pp. 281-286 ◽  
Author(s):  
Lech Raczynski ◽  
Krzysztof Wozniak ◽  
Tymon Rubel ◽  
Krzysztof Zaremba

Application of Density Based Clustering to Microarray Data AnalysisIn just a few years, gene expression microarrays have rapidly become a standard experimental tool in the biological and medical research. Microarray experiments are being increasingly carried out to address the wide range of problems, including the cluster analysis. The estimation of the number of clusters in datasets is one of the main problems of clustering microarrays. As a supplement to the existing methods we suggest the use of a density based clustering technique DBSCAN that automatically defines the number of clusters. The DBSCAN and other existing methods were compared using the microarray data from two datasets used for diagnosis of leukemia and lung cancer.


1992 ◽  
Vol 28 (3) ◽  
pp. 245-253 ◽  
Author(s):  
S. C. Pearce

SummaryAn experiment has its origin in the need to find answers to stated questions. From the start great care is given to its correct conduct (e.g. the application of treatments and the method of recording) as well as to statistical design, always with the original questions in mind. The analysis of its data is the climax of a long process and the questions to be answered must dominate all else. It is not enough to feed data into a computer package in the hope that it will provide an automated path to a true interpretation.Where the treatments have been chosen with care to answer specific questions, the statistical way of designating purpose is to declare ‘contrasts of interest’, each corresponding to a degree of freedom between treatments. They derive solely from the reasoning behind the selection of treatments. If possible the questions posed should be equal in number to the degrees of freedom and should admit of separate study because no one can give a single answer to several diverse questions.This paper shows how to define a contrast of interest and how to isolate it in the analysis of variance. Attention is given both to its contribution to the treatment sum of squares and to its variance (i.e. the precision with which it is estimated). Independence of estimation is also considered. Algebraic formulae are given for a restricted though important range of designs, which includes those that are completely randomized, in randomized blocks or in Latin squares, all treatments having the same replication. The methods can, however, be generalized to cover all designs. With these formulae it is possible both to test the existence of an interesting effect and to set confidence limits round its estimated value.


2021 ◽  
Author(s):  
Shirin Manafi

In this research, we introduce an approach to improve the reliability of genetic data analysis. Consistency of the results obtained from microarray data analysis strongly relies on elimination of non-biological variations during data normalization process. Instability in Housekeeping Gene (HKG) expression after performing common normalization methods might be an indication of inefficiency potentially resulting in sampling bias in differential expression analysis. This research aims to reduce the sampling bias in microarray experiments proposing a two-stage normalization algorithm. Proposed approach consists of non-linear Quantile normalization at the first stage and linear HKG based normalization at the second stage. We tested the efficiency of the two-stage normalization method using publicly available microarray datasets obtained from the experiments mainly in the field of reproductive biology. Results show that combined Robust Multiarray Average (RMA) and HKG normalization method reduces the sampling bias in experiments when variations in HKG expression is observed after RMA normalization.


Author(s):  
Richard S. Segall

Microarray informatics is a rapidly expanding discipline in which large amounts of multi-dimensional data are compressed into small storage units. Data mining of microarrays can be performed using techniques such as drill-down analysis rather than classical data analysis on a record-by-record basis. Both data and metadata can be captured in microarray experiments. The latter may be constructed by obtaining data samples from an experiment. Extractions can be made from these samples and formed into homogeneous arrays that are needed for higher level analysis and mining.


Sign in / Sign up

Export Citation Format

Share Document