DiCoExpress: a tool to process multifactorial RNAseq experiments from quality controls to co-expression analysis through differential analysis based on contrasts inside GLM models.

Mapping Intimacies ◽

10.21203/rs.2.19732/v2 ◽

2020 ◽

Author(s):

Ilana Lambert ◽

Christine Paysant-Le Roux ◽

Stefano Colella ◽

Marie-Laure Martin-Magniette

Keyword(s):

Quality Control ◽

Data Analysis ◽

Experimental Design ◽

Expression Analysis ◽

Generalized Linear Models ◽

Linear Models ◽

Differential Expression Analysis ◽

Differential Analysis ◽

Quality Controls ◽

Rnaseq Data

Abstract Background RNAseq is nowadays the method of choice for transcriptome analysis. In the last decades, a high number of statistical methods, and associated bioinformatics tools, for RNAseq analysis were developed. More recently, statistical studies realised neutral comparison studies using benchmark datasets, shedding light on the most appropriate approaches for RNAseq data analysis. Results DiCoExpress is a script-based tool implemented in R that includes methods chosen based on their performance in neutral comparisons studies. DiCoExpress uses pre-existing R packages including FactoMineR, edgeR and coseq , to perform quality control, differential, and co-expression analysis of RNAseq data. Users can perform the full analysis, providing a mapped read expression data file and a file containing the information on the experimental design. Following the quality control step, the user can move on to the differential expression analysis performed using generalized linear models thanks to the automated contrast writing function. A co-expression analysis is implemented using the coseq package. Lists of differentially expressed genes and identified co-expression clusters are automatically analyzed for enrichment of annotations provided by the user . We used DiCoExpress to analyze a publicly available RNAseq dataset on the transcriptional response of Bra ssica napus L. to silicon treatment in plant roots and mature leaves . This dataset, including two biological factors and three replicates for each condition, allowed us to demonstrate in a tutorial all the features of DiCoExpress. Conclusions DiCoExpress is an R script-based tool allowing users to perform a full RNAseq analysis from quality controls to co-expression analysis through differential analysis based on contrasts inside generalized linear models . DiCoExpress focuses on the statistical modelling of gene expression according to the experimental design and facilitates the data analysis leading the biological interpretation of the results.

Download Full-text

DiCoExpress: a workspace to process multifactorial RNAseq experiments from quality controls to co-expression analysis through differential analysis based on contrasts inside GLM models.

10.21203/rs.2.19732/v1 ◽

2019 ◽

Author(s):

Ilana Lambert ◽

Christine Paysant-Le Roux ◽

Stefano Colella ◽

Marie-Laure Martin-Magniette

Keyword(s):

Quality Control ◽

Experimental Design ◽

Expression Analysis ◽

Generalized Linear Models ◽

Linear Models ◽

Research Question ◽

Differential Expression Analysis ◽

Differential Analysis ◽

Quality Controls ◽

Rnaseq Data

Abstract Background RNAseq is nowadays the method of choice for transcriptome analysis. In the last decades, a high number of statistical methods, and associated bioinformatics tools, for RNAseq analysis were developed. More recently, statistical studies realized neutral comparison studies using benchmark datasets, shedding light on the most appropriate approaches for RNAseq data analysis. Nevertheless, performing an RNAseq analysis remains a challenge for the biologists. Results DiCoExpress is a workspace implemented in R that includes methods chosen based on their performance in neutral comparisons studies. DiCoExpress uses the pre-existing R packages as well as FactoMineR, edgeR and coseq, to perform quality control, differential, and co-expression analysis of RNAseq data. Users can perform the full analysis, providing a mapped read expression data file and a file containing the information on the experimental design. Following the quality control step, the user can move on to the differential expression analysis performed using generalized linear models with no effort thanks to the automated contrast writing function. DiCoExpress proposes a list of comparisons based on the experimental design, and the user needs only to choose the one(s) of interest for his research question. A co-expression analysis is implemented using the coseq package. Identified co-expression clusters are automatically analyzed for enrichment of annotations provided by the user, and several result outputs proposed. We used DiCoExpress to analyze a publicly available Bra ssica napus L. RNAseq dataset on the transcriptional response to silicon treatment in plant roots and mature leaves. This dataset, including two biological factors and three replicates for each condition, allowed us to demonstrate in a tutorial all the features of DiCoExpress. Conclusions DiCoExpress is an R workspace to allow users without advanced statistical knowledge and programming skills to perform a full RNAseq analysis from quality controls to co-expression analysis through differential analysis based on contrasts inside generalized linear models . Hence, with DiCoExpress, the user can focus on the statistical modeling of gene expression according to the experimental design and on the interpretation of the results of such analysis in biological terms.

Download Full-text

RNA-Seq Data Analysis: From Raw Data Quality Control to Differential Expression Analysis

Methods in Molecular Biology - Plant Germline Development ◽

10.1007/978-1-4939-7286-9_23 ◽

2017 ◽

pp. 295-307 ◽

Cited By ~ 3

Author(s):

Weihong Qi ◽

Ralph Schlapbach ◽

Hubert Rehrauer

Keyword(s):

Quality Control ◽

Data Analysis ◽

Data Quality ◽

Differential Expression ◽

Expression Analysis ◽

Differential Expression Analysis ◽

Rna Seq ◽

Data Quality Control ◽

Raw Data

Download Full-text

Score-matching representative approach for big data analysis with generalized linear models

Electronic Journal of Statistics ◽

10.1214/21-ejs1965 ◽

2022 ◽

Vol 16 (1) ◽

Author(s):

Keren Li ◽

Jie Yang

Keyword(s):

Big Data ◽

Data Analysis ◽

Generalized Linear Models ◽

Linear Models ◽

Big Data Analysis

Download Full-text

Integrative Differential Expression Analysis for Multiple EXperiments (IDEAMEX): A Web Server Tool for Integrated RNA-Seq Data Analysis

Frontiers in Genetics ◽

10.3389/fgene.2019.00279 ◽

2019 ◽

Vol 10 ◽

Cited By ~ 7

Author(s):

Verónica Jiménez-Jacinto ◽

Alejandro Sanchez-Flores ◽

Leticia Vega-Alvarado

Keyword(s):

Data Analysis ◽

Differential Expression ◽

Expression Analysis ◽

Differential Expression Analysis ◽

Web Server ◽

Rna Seq

Download Full-text

DiCoExpress: a tool to process multifactorial RNAseq experiments from quality controls to co-expression analysis through differential analysis based on contrasts inside GLM models

Plant Methods ◽

10.1186/s13007-020-00611-7 ◽

2020 ◽

Vol 16 (1) ◽

Author(s):

Ilana Lambert ◽

Christine Paysant-Le Roux ◽

Stefano Colella ◽

Marie-Laure Martin-Magniette

Keyword(s):

Expression Analysis ◽

Differential Analysis ◽

Quality Controls

Download Full-text

PCN4 DATA ANALYSIS WITH GENERALIZED LINEAR MODELS ON LUNG CANCER DATA

Value in Health ◽

10.1016/s1098-3015(10)70184-4 ◽

2008 ◽

Vol 11 (3) ◽

pp. A55

Author(s):

G Tang

Keyword(s):

Lung Cancer ◽

Data Analysis ◽

Generalized Linear Models ◽

Linear Models ◽

Cancer Data ◽

Lung Cancer Data

Download Full-text

A general approach to categorical data analysis with missing data, using generalized linear models with composite links

Psychometrika ◽

10.1007/bf02294657 ◽

1992 ◽

Vol 57 (1) ◽

pp. 29-42 ◽

Cited By ~ 6

Author(s):

David Rindskopf

Keyword(s):

Data Analysis ◽

Missing Data ◽

Generalized Linear Models ◽

Categorical Data ◽

Linear Models ◽

Categorical Data Analysis

Download Full-text

DiADeM: differential analysis via dependency modelling of chromatin interactions with robust generalized linear models

10.1101/654699 ◽

2019 ◽

Author(s):

Rafał Zaborowski ◽

Bartek Wilczyński

Keyword(s):

Generalized Linear Models ◽

Linear Models ◽

Standard Technique ◽

Differential Analysis ◽

Genomic Distance ◽

Chromosome Conformation ◽

Structure And Dynamics ◽

Chromatin Interactions ◽

Coverage Bias ◽

Biochemical Technique

AbstractHigh throughput Chromosome Conformation Capture experiments have become the standard technique to assess the structure and dynamics of chromosomes in living cells. As any other sufficiently advanced biochemical technique, Hi-C datasets are complex and contain multiple documented biases, with the main ones being the non-uniform read coverage and the decay of contact coverage with genomic distance. Both of these effects have been studied and there are published methods that are able to normalize different Hi-C data to mitigate these biases to some extent. It is crucial that this is done properly, or otherwise the results of any comparative analysis of two or more Hi-C experiments are bound to be biased. In this paper we study both mentioned biases present in the Hi-C data and show that normalization techniques aimed at alleviating the coverage bias are at the same time exacerbating the problems with contact decay bias. We also postulate that it is possible to use generalized linear models to directly compare non-normalized data an that it is giving better results in identification of differential contacts between Hi-C matrices than using the normalized data.

Download Full-text

A guide to creating design matrices for gene expression experiments

F1000Research ◽

10.12688/f1000research.27893.1 ◽

2020 ◽

Vol 9 ◽

pp. 1444

Author(s):

Charity W. Law ◽

Kathleen Zeglinski ◽

Xueyi Dong ◽

Monther Alhamdoosh ◽

Gordon K. Smyth ◽

...

Keyword(s):

Gene Expression ◽

Data Analysis ◽

Differential Expression ◽

Rna Sequencing ◽

Expression Analysis ◽

Graphical Representation ◽

Differential Expression Analysis ◽

Data Types ◽

Software Packages ◽

Set Up

Differential expression analysis of genomic data types, such as RNA-sequencing experiments, use linear models to determine the size and direction of the changes in gene expression. For RNA-sequencing, there are several established software packages for this purpose accompanied with analysis pipelines that are well described. However, there are two crucial steps in the analysis process that can be a stumbling block for many -- the set up an appropriate model via design matrices and the set up of comparisons of interest via contrast matrices. These steps are particularly troublesome because an extensive catalogue for design and contrast matrices does not currently exist. One would usually search for example case studies across different platforms and mix and match the advice from those sources to suit the dataset they have at hand. This article guides the reader through the basics of how to set up design and contrast matrices. We take a practical approach by providing code and graphical representation of each case study, starting with simpler examples (e.g. models with a single explanatory variable) and move onto more complex ones (e.g. interaction models, mixed effects models, higher order time series and cyclical models). Although our work has been written specifically with a limma-style pipeline in mind, most of it is also applicable to other software packages for differential expression analysis, and the ideas covered can be adapted to data analysis of other high-throughput technologies. Where appropriate, we explain the interpretation and differences between models to aid readers in their own model choices. Unnecessary jargon and theory is omitted where possible so that our work is accessible to a wide audience of readers, from beginners to those with experience in genomics data analysis.

Download Full-text

Data Analysis Using Hierarchical Generalized Linear Models With R

10.1201/9781315211060 ◽

2017 ◽

Cited By ~ 11

Author(s):

Youngjo Lee ◽

Lars Rönnegård ◽

Maengseok Noh

Keyword(s):

Data Analysis ◽

Generalized Linear Models ◽

Linear Models ◽

Hierarchical Generalized Linear Models

Download Full-text