Microarray Data Preprocessing: From Experimental Design to Differential Analysis

Author(s):  
Antonio Federico ◽  
Laura Aliisa Saarimäki ◽  
Angela Serra ◽  
Giusy del Giudice ◽  
Pia Anneli Sofia Kinaret ◽  
...  
2006 ◽  
Vol 3 (2) ◽  
pp. 77-89
Author(s):  
Y. E. Pittelkow ◽  
S. R. Wilson

Summary Various statistical models have been proposed for detecting differential gene expression in data from microarray experiments. Given such detection, we are usually interested in describing the differential expression patterns. Due to the large number of genes that are typically analysed in microarray experiments, possibly more than ten thousand, the tasks of interpretation and communication of all the corresponding statistical models pose a considerable challenge, except perhaps in the simplest experiment involving only two groups. A further challenge is to find methods to summarize the resulting models. These challenges increase with experimental complexity.Biologists often wish to sort genes into ‘classes’ with similar response profiles/patterns. So, in this paper we describe a likelihood approach for assigning genes to these different class patterns for data from a replicated experimental design.The number of potential patterns increases very quickly as the number of combinations in the experimental design increases. In a two group experimental design there are only three patterns required to describe the mean response: up, down and no difference. For a factorial design with three treatments there are 13 different patterns, and with four levels there are 75 potential patterns to be considered, and so on. The approach is applied to the identification of differential response patterns in gene expression from a microarray experiment using RNAextracted from the leaves of Arabidopsis thaliana plants. We compare patterns of response found using additive and multiplicative models. A multiplicative model is more commonly used in the statistical analysis of microarray data because of the variance stabilizing properties of the logarithmic function. Then the error structure of the model is taken to be log-Normal. On the other hand, for the additive model the gene expression value is modeled directly as being from a gamma distribution which successfully accounts for the constant coefficient of variation often observed. Appropriate visualization displays for microarray data are important as a way of communicating the patterns of response amongst the genes. Here we use graphical ‘icons’ to represent the patterns of up/down and no response and two alternative displays, the Gene-plot and a grid layout to provide rapid overall summaries of the gene expression patterns.


2019 ◽  
Vol 9 (7) ◽  
pp. 871-880
Author(s):  
Yifan Han ◽  
Lei Zhou

Thyroid cancer has become an increasingly common malignant tumor around the world, and its incidence is increasing year by year. In this study, mRNA microarray data of thyroid cancer patients from four periods were collected from the TCGA database. We performed a series of bioinformatics analyses on these mRNA expression profiles, including differential analysis, co-expression analysis, enrichment analysis, regulator prediction, and survival analysis. There were 13126, 10914, 13585, and 13241 differential genes in the four periods; 4822 differential genes were obtained by union and deduplication (p < 0.01). Weighted gene co-expression network analysis indicated a total of 21 functional disorder modules. In each module, PLD5, CHD4, ADGRA3, ITGA3, etc. were the key genes. Enrichment analysis showed that the dysfunctional module genes were mainly related to pre-replicative complex assembly, Cytokine–cytokine receptor interaction, and MAPK signaling pathway. We downloaded thyroid cancer-associated miRNA microarray data from the GEO database for differential analysis. Then, we crossed the predicted ncRNA with the differential miRNA to obtain thyroid cancer-associated regulatory factors. Finally, we found that miRNA-4665-3p regulates the core gene PLD5, and six regulators such as miRNA-3140-3p and miRNA-324-3p regulate the core gene CHD4. Survival analysis showed that both up-regulation of PLD5 expression and down-regulation of CHD4 expression accelerated patient death. According to the above analysis, we believe miRNA-4665-3p regulates the expression of PLD5 and affects the development of thyroid cancer. Its up-regulation promotes the death of patients.


2007 ◽  
Vol 849 (1-2) ◽  
pp. 261-272 ◽  
Author(s):  
Jean-François Chich ◽  
Olivier David ◽  
Fanny Villers ◽  
Brigitte Schaeffer ◽  
Didier Lutomski ◽  
...  

2007 ◽  
Vol 8 (1) ◽  
Author(s):  
Srinka Ghosh ◽  
Heather A Hirsch ◽  
Edward A Sekinger ◽  
Philipp Kapranov ◽  
Kevin Struhl ◽  
...  

2019 ◽  
Author(s):  
Ilana Lambert ◽  
Christine Paysant-Le Roux ◽  
Stefano Colella ◽  
Marie-Laure Martin-Magniette

Abstract Background RNAseq is nowadays the method of choice for transcriptome analysis. In the last decades, a high number of statistical methods, and associated bioinformatics tools, for RNAseq analysis were developed. More recently, statistical studies realized neutral comparison studies using benchmark datasets, shedding light on the most appropriate approaches for RNAseq data analysis. Nevertheless, performing an RNAseq analysis remains a challenge for the biologists. Results DiCoExpress is a workspace implemented in R that includes methods chosen based on their performance in neutral comparisons studies. DiCoExpress uses the pre-existing R packages as well as FactoMineR, edgeR and coseq, to perform quality control, differential, and co-expression analysis of RNAseq data. Users can perform the full analysis, providing a mapped read expression data file and a file containing the information on the experimental design. Following the quality control step, the user can move on to the differential expression analysis performed using generalized linear models with no effort thanks to the automated contrast writing function. DiCoExpress proposes a list of comparisons based on the experimental design, and the user needs only to choose the one(s) of interest for his research question. A co-expression analysis is implemented using the coseq package. Identified co-expression clusters are automatically analyzed for enrichment of annotations provided by the user, and several result outputs proposed. We used DiCoExpress to analyze a publicly available Bra ssica napus L. RNAseq dataset on the transcriptional response to silicon treatment in plant roots and mature leaves. This dataset, including two biological factors and three replicates for each condition, allowed us to demonstrate in a tutorial all the features of DiCoExpress. Conclusions DiCoExpress is an R workspace to allow users without advanced statistical knowledge and programming skills to perform a full RNAseq analysis from quality controls to co-expression analysis through differential analysis based on contrasts inside generalized linear models . Hence, with DiCoExpress, the user can focus on the statistical modeling of gene expression according to the experimental design and on the interpretation of the results of such analysis in biological terms.


2020 ◽  
Author(s):  
Ilana Lambert ◽  
Christine Paysant-Le Roux ◽  
Stefano Colella ◽  
Marie-Laure Martin-Magniette

Abstract Background RNAseq is nowadays the method of choice for transcriptome analysis. In the last decades, a high number of statistical methods, and associated bioinformatics tools, for RNAseq analysis were developed. More recently, statistical studies realised neutral comparison studies using benchmark datasets, shedding light on the most appropriate approaches for RNAseq data analysis. Results DiCoExpress is a script-based tool implemented in R that includes methods chosen based on their performance in neutral comparisons studies. DiCoExpress uses pre-existing R packages including FactoMineR, edgeR and coseq , to perform quality control, differential, and co-expression analysis of RNAseq data. Users can perform the full analysis, providing a mapped read expression data file and a file containing the information on the experimental design. Following the quality control step, the user can move on to the differential expression analysis performed using generalized linear models thanks to the automated contrast writing function. A co-expression analysis is implemented using the coseq package. Lists of differentially expressed genes and identified co-expression clusters are automatically analyzed for enrichment of annotations provided by the user . We used DiCoExpress to analyze a publicly available RNAseq dataset on the transcriptional response of Bra ssica napus L. to silicon treatment in plant roots and mature leaves . This dataset, including two biological factors and three replicates for each condition, allowed us to demonstrate in a tutorial all the features of DiCoExpress. Conclusions DiCoExpress is an R script-based tool allowing users to perform a full RNAseq analysis from quality controls to co-expression analysis through differential analysis based on contrasts inside generalized linear models . DiCoExpress focuses on the statistical modelling of gene expression according to the experimental design and facilitates the data analysis leading the biological interpretation of the results.


Author(s):  
B.M. Bolstad ◽  
F. Collin ◽  
K.M. Simpson ◽  
R.A. Irizarry ◽  
T.P. Speed

Sign in / Sign up

Export Citation Format

Share Document