scholarly journals A comparative study of techniques for differential expression analysis on RNA-Seq data

2014 ◽  
Author(s):  
Zong Hong Zhang ◽  
Dhanisha J. Jhaveri ◽  
Vikki M. Marshall ◽  
Denis C. Bauer ◽  
Janette Edson ◽  
...  

Recent advances in next-generation sequencing technology allow high-throughput cDNA sequencing (RNA-Seq) to be widely applied in transcriptomic studies, in particular for detecting differentially expressed genes between groups. Many software packages have been developed for the identification of differentially expressed genes (DEGs) between treatment groups based on RNA-Seq data. However, there is a lack of consensus on how to approach an optimal study design and choice of suitable software for the analysis. In this comparative study we evaluate the performance of three of the most frequently used software tools: Cufflinks-Cuffdiff2, DESeq and edgeR. A number of important parameters of RNA-Seq technology were taken into consideration, including the number of replicates, sequencing depth, and balanced vs. unbalanced sequencing depth within and between groups. We benchmarked results relative to sets of DEGs identified through either quantitative RT-PCR or microarray. We observed that edgeR performs slightly better than DESeq and Cuffdiff2 in terms of the ability to uncover true positives. Overall, DESeq or taking the intersection of DEGs from two or more tools is recommended if the number of false positives is a major concern in the study. In other circumstances, edgeR is slightly preferable for differential expression analysis at the expense of potentially introducing more false positives.


PLoS ONE ◽  
2014 ◽  
Vol 9 (8) ◽  
pp. e103207 ◽  
Author(s):  
Zong Hong Zhang ◽  
Dhanisha J. Jhaveri ◽  
Vikki M. Marshall ◽  
Denis C. Bauer ◽  
Janette Edson ◽  
...  


2018 ◽  
Author(s):  
Anna C. Salzberg ◽  
Jiafen Hu ◽  
Elizabeth J. Conroy ◽  
Nancy M. Cladel ◽  
Robert M. Brucklacher ◽  
...  

AbstractBest practices to handling duplicated mapped reads in RNA-seq analyses has long been discussed but a gold standard method has yet to be established, as such duplicates could originate from valid biological transcripts or they could be PCR-related artifacts. Here we used the NEXTflex™qRNA-SeqTM(aka Molecular Indexing™) technology to identify PCR duplicates via the random attachment of unique molecular labels to each cDNA molecule prior to PCR amplification. We found that up to 64.3% of the single end and 19.3% of the mouse paired end duplicates originated from valid biological transcripts rather than PCR artifacts. For single end reads, either removing or retaining all duplicates resulted in a substantial number of false positives (up to 47.0%) and false negatives (up to 12.1%) in the sets of significantly differentially expressed genes. For paired end reads, only the alignment retaining all duplicates resulted in a substantial number of false positives. This is the first effort to evaluate the performance of qRNA-seq using ‘real-world’ biomedical samples, and we found that PCR duplicate identification provided minor benefits for paired end reads but greatly improved the sensitivity and specificity in the determination of the significantly differentially expressed genes for single end reads.



2021 ◽  
Author(s):  
Jordan W. Squair ◽  
Matthieu Gautier ◽  
Claudia Kathe ◽  
Mark A. Anderson ◽  
Nicholas D. James ◽  
...  

Differential expression analysis in single-cell transcriptomics enables the dissection of cell-type-specific responses to perturbations such as disease, trauma, or experimental manipulation. While many statistical methods are available to identify differentially expressed genes, the principles that distinguish these methods and their performance remain unclear. Here, we show that the relative performance of these methods is contingent on their ability to account for variation between biological replicates. Methods that ignore this inevitable variation are biased and prone to false discoveries. Indeed, the most widely used methods can discover hundreds of differentially expressed genes in the absence of biological differences. Our results suggest an urgent need for a paradigm shift in the methods used to perform differential expression analysis in single-cell data.



2015 ◽  
Author(s):  
Pavel Zakharov ◽  
Alexey Sergushichev ◽  
Alexander Predeus ◽  
Maxim Artyomov

RNA-seq is a powerful tool for gene expression profiling and differential expression analysis. Its power depends on sequencing depth which limits its high-throughput potential, with 10-15 million reads considered as optimal balance between quality of differential expression calling and cost per sample. We observed, however, that some statistical features of the data, e.g. gene count distribution, are preserved well below 10-15M reads, and found that they improve differential expression analysis at low sequencing depths when distribution statistics is estimated by pooling individual samples to a combined higher-depth library. Using a novel gene-by-gene scaling technique, based on the fact that gene counts obey Pareto-like distribution, we re-normalize samples towards bigger sequencing depth and show that this leads to significant improvement in differential expression calling, with only a marginal increase in false positive calls. This makes differential expression calling from 3-4M reads comparable to 10-15M reads, improving high-throughput of RNA-sequencing 3-4 fold.



2012 ◽  
Vol 2012 ◽  
pp. 1-8 ◽  
Author(s):  
Rashi Gupta ◽  
Isha Dewan ◽  
Richa Bharti ◽  
Alok Bhattacharya

RNA-Seq is increasingly being used for gene expression profiling. In this approach, next-generation sequencing (NGS) platforms are used for sequencing. Due to highly parallel nature, millions of reads are generated in a short time and at low cost. Therefore analysis of the data is a major challenge and development of statistical and computational methods is essential for drawing meaningful conclusions from this huge data. In here, we assessed three different types of normalization (transcript parts per million, trimmed mean of M values, quantile normalization) and evaluated if normalized data reduces technical variability across replicates. In addition, we also proposed two novel methods for detecting differentially expressed genes between two biological conditions: (i) likelihood ratio method, and (ii) Bayesian method. Our proposed methods for finding differentially expressed genes were tested on three real datasets. Our methods performed at least as well as, and often better than, the existing methods for analysis of differential expression.



2018 ◽  
Author(s):  
Almas Jabeen ◽  
Nadeem Ahmad ◽  
Khalid Raza

Zika virus (ZIKV) is considered to be an emerging viral outbreak due to its link to diseases like microcephaly, Guillain-Barre Syndrome in human. In this paper, we identify differentially expressed genes (DEGs) using RNA-seq data. In this study, we adopted the RNA-seq analysis pipeline to quantify RNA-seq data into read counts. Our analysis uncovers the significant DEGs which may be involved in the altered biological process somehow. Here, we report the list of significant DEGs, out of which three genes are found to be highly differentially expressed. In addition, our analysis also predicts other moderate DEGs, low DEGs whose differential expression was induced due to ZIKV infections.



2020 ◽  
Author(s):  
Diana Lobo ◽  
Raquel Godinho ◽  
John Archer

Abstract Background In the last decades, the evolution of RNA-Seq has yielded archived datasets that possess the potential for providing unprecedented inter-study insight into transcriptome evolution, once background noise has been reduced. Here we present a method to quantify intra-condition variation and to remove reference-based transcripts associated with highly variable read counts, prior to differential expression analysis. The method utilizes variation within pairwise distances between normalized read counts for each transcript across all included samples of a given condition. As a case study, we demonstrate our approach at an inter and intra-study level using RNA-seq data from brain samples of dogs, wolves, and two strains of fox (aggressive and tame) prior to performing differential expression analysis to identify common genes associated with tame behaviour. Results By applying our method, the distribution of the gene-wise dispersion estimates improved and the number of outliers detected in differential expression analysis decreased. Several genes that initially were differentially expressed in the non-filtered datasets were removed due to high intra-condition variation. Additionally, by optimizing the detection of differentially expressed transcripts, the overall number increased between dogs vs wolves and tame vs aggressive foxes when compared to the non-filtered datasets. Using these filtered sets, we found common over expressed genes in dogs and tame foxes, including those involved in brain development, neurotransmission and immunity, factors known to be involved in domestication. Conclusions We presented a method to quantify and remove intra-condition variation from RNA-seq count data and demonstrate its usage in improving the distribution of gene-wise dispersion estimates and ultimately, reduce the number of false positives in differential gene expression analysis. We provide the method as a freely available tool, to aid studies using RNA-seq to calculate and characterize the variation present within data prior to perform differential expression analysis. Additionally, we identify candidate genes involved with selection for tameness, which seems to have played a crucial role in the canine domestication.



2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Matthew Chung ◽  
Vincent M. Bruno ◽  
David A. Rasko ◽  
Christina A. Cuomo ◽  
José F. Muñoz ◽  
...  

AbstractAdvances in transcriptome sequencing allow for simultaneous interrogation of differentially expressed genes from multiple species originating from a single RNA sample, termed dual or multi-species transcriptomics. Compared to single-species differential expression analysis, the design of multi-species differential expression experiments must account for the relative abundances of each organism of interest within the sample, often requiring enrichment methods and yielding differences in total read counts across samples. The analysis of multi-species transcriptomics datasets requires modifications to the alignment, quantification, and downstream analysis steps compared to the single-species analysis pipelines. We describe best practices for multi-species transcriptomics and differential gene expression.



2021 ◽  
Vol 3 (2) ◽  
Author(s):  
Xueyi Dong ◽  
Luyi Tian ◽  
Quentin Gouil ◽  
Hasaru Kariyawasam ◽  
Shian Su ◽  
...  

Abstract Application of Oxford Nanopore Technologies’ long-read sequencing platform to transcriptomic analysis is increasing in popularity. However, such analysis can be challenging due to the high sequence error and small library sizes, which decreases quantification accuracy and reduces power for statistical testing. Here, we report the analysis of two nanopore RNA-seq datasets with the goal of obtaining gene- and isoform-level differential expression information. A dataset of synthetic, spliced, spike-in RNAs (‘sequins’) as well as a mouse neural stem cell dataset from samples with a null mutation of the epigenetic regulator Smchd1 was analysed using a mix of long-read specific tools for preprocessing together with established short-read RNA-seq methods for downstream analysis. We used limma-voom to perform differential gene expression analysis, and the novel FLAMES pipeline to perform isoform identification and quantification, followed by DRIMSeq and limma-diffSplice (with stageR) to perform differential transcript usage analysis. We compared results from the sequins dataset to the ground truth, and results of the mouse dataset to a previous short-read study on equivalent samples. Overall, our work shows that transcriptomic analysis of long-read nanopore data using long-read specific preprocessing methods together with short-read differential expression methods and software that are already in wide use can yield meaningful results.



Sign in / Sign up

Export Citation Format

Share Document