scholarly journals DiCoExpress: a tool to process multifactorial RNAseq experiments from quality controls to co-expression analysis through differential analysis based on contrasts inside GLM models.

2020 ◽  
Author(s):  
Ilana Lambert ◽  
Christine Paysant-Le Roux ◽  
Stefano Colella ◽  
Marie-Laure Martin-Magniette

Abstract Background RNAseq is nowadays the method of choice for transcriptome analysis. In the last decades, a high number of statistical methods, and associated bioinformatics tools, for RNAseq analysis were developed. More recently, statistical studies realised neutral comparison studies using benchmark datasets, shedding light on the most appropriate approaches for RNAseq data analysis. Results DiCoExpress is a script-based tool implemented in R that includes methods chosen based on their performance in neutral comparisons studies. DiCoExpress uses pre-existing R packages including FactoMineR, edgeR and coseq , to perform quality control, differential, and co-expression analysis of RNAseq data. Users can perform the full analysis, providing a mapped read expression data file and a file containing the information on the experimental design. Following the quality control step, the user can move on to the differential expression analysis performed using generalized linear models thanks to the automated contrast writing function. A co-expression analysis is implemented using the coseq package. Lists of differentially expressed genes and identified co-expression clusters are automatically analyzed for enrichment of annotations provided by the user . We used DiCoExpress to analyze a publicly available RNAseq dataset on the transcriptional response of Bra ssica napus L. to silicon treatment in plant roots and mature leaves . This dataset, including two biological factors and three replicates for each condition, allowed us to demonstrate in a tutorial all the features of DiCoExpress. Conclusions DiCoExpress is an R script-based tool allowing users to perform a full RNAseq analysis from quality controls to co-expression analysis through differential analysis based on contrasts inside generalized linear models . DiCoExpress focuses on the statistical modelling of gene expression according to the experimental design and facilitates the data analysis leading the biological interpretation of the results.

2019 ◽  
Author(s):  
Ilana Lambert ◽  
Christine Paysant-Le Roux ◽  
Stefano Colella ◽  
Marie-Laure Martin-Magniette

Abstract Background RNAseq is nowadays the method of choice for transcriptome analysis. In the last decades, a high number of statistical methods, and associated bioinformatics tools, for RNAseq analysis were developed. More recently, statistical studies realized neutral comparison studies using benchmark datasets, shedding light on the most appropriate approaches for RNAseq data analysis. Nevertheless, performing an RNAseq analysis remains a challenge for the biologists. Results DiCoExpress is a workspace implemented in R that includes methods chosen based on their performance in neutral comparisons studies. DiCoExpress uses the pre-existing R packages as well as FactoMineR, edgeR and coseq, to perform quality control, differential, and co-expression analysis of RNAseq data. Users can perform the full analysis, providing a mapped read expression data file and a file containing the information on the experimental design. Following the quality control step, the user can move on to the differential expression analysis performed using generalized linear models with no effort thanks to the automated contrast writing function. DiCoExpress proposes a list of comparisons based on the experimental design, and the user needs only to choose the one(s) of interest for his research question. A co-expression analysis is implemented using the coseq package. Identified co-expression clusters are automatically analyzed for enrichment of annotations provided by the user, and several result outputs proposed. We used DiCoExpress to analyze a publicly available Bra ssica napus L. RNAseq dataset on the transcriptional response to silicon treatment in plant roots and mature leaves. This dataset, including two biological factors and three replicates for each condition, allowed us to demonstrate in a tutorial all the features of DiCoExpress. Conclusions DiCoExpress is an R workspace to allow users without advanced statistical knowledge and programming skills to perform a full RNAseq analysis from quality controls to co-expression analysis through differential analysis based on contrasts inside generalized linear models . Hence, with DiCoExpress, the user can focus on the statistical modeling of gene expression according to the experimental design and on the interpretation of the results of such analysis in biological terms.


2019 ◽  
Author(s):  
Rafał Zaborowski ◽  
Bartek Wilczyński

AbstractHigh throughput Chromosome Conformation Capture experiments have become the standard technique to assess the structure and dynamics of chromosomes in living cells. As any other sufficiently advanced biochemical technique, Hi-C datasets are complex and contain multiple documented biases, with the main ones being the non-uniform read coverage and the decay of contact coverage with genomic distance. Both of these effects have been studied and there are published methods that are able to normalize different Hi-C data to mitigate these biases to some extent. It is crucial that this is done properly, or otherwise the results of any comparative analysis of two or more Hi-C experiments are bound to be biased. In this paper we study both mentioned biases present in the Hi-C data and show that normalization techniques aimed at alleviating the coverage bias are at the same time exacerbating the problems with contact decay bias. We also postulate that it is possible to use generalized linear models to directly compare non-normalized data an that it is giving better results in identification of differential contacts between Hi-C matrices than using the normalized data.


F1000Research ◽  
2020 ◽  
Vol 9 ◽  
pp. 1444
Author(s):  
Charity W. Law ◽  
Kathleen Zeglinski ◽  
Xueyi Dong ◽  
Monther Alhamdoosh ◽  
Gordon K. Smyth ◽  
...  

Differential expression analysis of genomic data types, such as RNA-sequencing experiments, use linear models to determine the size and direction of the changes in gene expression. For RNA-sequencing, there are several established software packages for this purpose accompanied with analysis pipelines that are well described. However, there are two crucial steps in the analysis process that can be a stumbling block for many -- the set up an appropriate model via design matrices and the set up of comparisons of interest via contrast matrices. These steps are particularly troublesome because an extensive catalogue for design and contrast matrices does not currently exist. One would usually search for example case studies across different platforms and mix and match the advice from those sources to suit the dataset they have at hand. This article guides the reader through the basics of how to set up design and contrast matrices. We take a practical approach by providing code and graphical representation of each case study, starting with simpler examples (e.g. models with a single explanatory variable) and move onto more complex ones (e.g. interaction models, mixed effects models, higher order time series and cyclical models). Although our work has been written specifically with a limma-style pipeline in mind, most of it is also applicable to other software packages for differential expression analysis, and the ideas covered can be adapted to data analysis of other high-throughput technologies. Where appropriate, we explain the interpretation and differences between models to aid readers in their own model choices. Unnecessary jargon and theory is omitted where possible so that our work is accessible to a wide audience of readers, from beginners to those with experience in genomics data analysis.


Sign in / Sign up

Export Citation Format

Share Document