Error Distribution for Gene Expression Data

We present a new instance of Laplace's second Law of Errors and show how it can be used in the analysis of data from microarray experiments. This error distribution is shown to fit microarray expression data much better than a normal distribution. The use of this distribution in a parametric bootstrap leads to more powerful tests as we show that the t-test is conservative in this setting. We propose a biological explanations for this distribution based on the Pareto distribution of the variables used to compute the log ratios.

Download Full-text

Impact of gene annotation choice on the quantification of RNA-seq data

10.1101/2021.01.07.425794 ◽

2021 ◽

Author(s):

David Chisanga ◽

Yang Liao ◽

Wei Shi

Keyword(s):

Gene Expression ◽

Gene Annotation ◽

Expression Data ◽

Rna Seq ◽

Microarray Expression Data ◽

Refseq Annotation ◽

Sequencing Quality ◽

Gene Expression Quantification ◽

Microarray Expression ◽

Expression Quantification

RNA sequencing is currently the method of choice for genome-wide profiling of gene expression. A popular approach to quantify expression levels of genes from RNA-seq data is to map reads to a reference genome and then count mapped reads to each gene. Gene annotation data, which include chromosomal coordinates of exons for tens of thousands of genes, are required for this quantification process. There are several major sources of gene annotations that can be used for quantification, such as Ensembl and RefSeq databases. However, there is very little understanding of the effect that the choice of annotation has on the accuracy of gene expression quantification in an RNA-seq analysis. In this paper, we present results from our comparison of Ensembl and RefSeq human annotations on their impact on gene expression quantification using a benchmark RNA-seq dataset generated by the SEquencing Quality Control (SEQC) consortium. We show that the use of RefSeq gene annotation models led to better quantification accuracy, based on the correlation with ground truths including expression data from $>$800 real-time PCR validated genes, known titration ratios of gene expression and microarray expression data. We also found that the recent expansion of the RefSeq annotation has led to a decrease in its annotation accuracy. Finally, we demonstrated that the RNA-seq quantification differences observed between different annotations were not affected by the use of different normalization methods.

Download Full-text

Data-Driven Analysis of Age, Sex, and Tissue Effects on Gene Expression Variability in Alzheimer’s Disease

10.1101/498527 ◽

2018 ◽

Author(s):

Lavida R.K. Brooks ◽

George I. Mias

Keyword(s):

Gene Expression ◽

Alzheimer’S Disease ◽

Alzheimer's Disease ◽

Meta Analysis ◽

Tissue Type ◽

Healthy Controls ◽

Expression Data ◽

Mitochondrial Translation ◽

Microarray Expression Data ◽

Microarray Expression

ABSTRACTAlzheimer’s disease (AD) has been categorized by the Centers for Disease Control and Prevention (CDC) as the 6thleading cause of death in the United States. AD is a significant health-care burden because of its increased occurrence (specifically in the elderly population) and the lack of effective treatments and preventive methods. With an increase in life expectancy, the CDC expects AD cases to rise to 15 million by 2060. Aging has been previously associated with susceptibility to AD, and there are ongoing efforts to effectively differentiate between normal and AD age-related brain degeneration and memory loss. AD targets neuronal function and can cause neuronal loss due to the buildup of amyloid-beta plaques and intracellular neurofibrillary tangles.Our study aims to identify temporal changes within gene expression profiles of healthy controls and AD subjects. We conducted a meta-analysis using publicly available microarray expression data from AD and healthy cohorts. For our meta-analysis, we selected datasets that reported donor age and gender, and used Affymetrix and Illumina microarray platforms (8 datasets, 2,088 samples). Raw microarray expression data were re-analyzed, and normalized across arrays. We then performed an analysis of variance, using a linear model that incorporated age, tissue type, sex, and disease state as effects, as well as study to account for batch effects, and including binary interaction between factors. Our results identified 3,735 statistically significant (Bonferroni adjusted p<0.05) gene expression differences between AD and healthy controls, which we filtered for biological effect (10% two-tailed quantiles of mean differences between groups) to obtain 352 genes. Interesting pathways identified as enriched comprised of neurodegenerative diseases pathways (including AD), and also mitochondrial translation and dysfunction, synaptic vesicle cycle and GABAergic synapse, and gene ontology terms enrichment in neuronal system, transmission across chemical synapses and mitochondrial translation.Overall our approach allowed us to effectively combine multiple available microarray datasets and identify gene expression differences between AD and healthy individuals including full age and tissue type considerations. Our findings provide potential gene and pathway associations that can be targeted to improve AD diagnostics and potentially treatment or prevention. (US).

Download Full-text