Differential Expression for RNA Sequencing (RNA-Seq) Data: Mapping, Summarization, Statistical Analysis, and Experimental Design

Author(s):  
Matthew D. Young ◽  
Davis J. McCarthy ◽  
Matthew J. Wakefield ◽  
Gordon K. Smyth ◽  
Alicia Oshlack ◽  
...  
Genes ◽  
2021 ◽  
Vol 12 (12) ◽  
pp. 1947
Author(s):  
Samarendra Das ◽  
Anil Rai ◽  
Michael L. Merchant ◽  
Matthew C. Cave ◽  
Shesh N. Rai

Single-cell RNA-sequencing (scRNA-seq) is a recent high-throughput sequencing technique for studying gene expressions at the cell level. Differential Expression (DE) analysis is a major downstream analysis of scRNA-seq data. DE analysis the in presence of noises from different sources remains a key challenge in scRNA-seq. Earlier practices for addressing this involved borrowing methods from bulk RNA-seq, which are based on non-zero differences in average expressions of genes across cell populations. Later, several methods specifically designed for scRNA-seq were developed. To provide guidance on choosing an appropriate tool or developing a new one, it is necessary to comprehensively study the performance of DE analysis methods. Here, we provide a review and classification of different DE approaches adapted from bulk RNA-seq practice as well as those specifically designed for scRNA-seq. We also evaluate the performance of 19 widely used methods in terms of 13 performance metrics on 11 real scRNA-seq datasets. Our findings suggest that some bulk RNA-seq methods are quite competitive with the single-cell methods and their performance depends on the underlying models, DE test statistic(s), and data characteristics. Further, it is difficult to obtain the method which will be best-performing globally through individual performance criterion. However, the multi-criteria and combined-data analysis indicates that DECENT and EBSeq are the best options for DE analysis. The results also reveal the similarities among the tested methods in terms of detecting common DE genes. Our evaluation provides proper guidelines for selecting the proper tool which performs best under particular experimental settings in the context of the scRNA-seq.


2015 ◽  
Author(s):  
Abhinav Nellore ◽  
Leonardo Collado-Torres ◽  
Andrew E Jaffe ◽  
José Alquicira-Hernández ◽  
Jacob Pritt ◽  
...  

RNA sequencing (RNA-seq) experiments now span hundreds to thousands of samples. Current spliced alignment software is designed to analyze each sample separately. Consequently, no information is gained from analyzing multiple samples together, and it is difficult to reproduce the exact analysis without access to original computing resources. We describe Rail-RNA, a cloud-enabled spliced aligner that analyzes many samples at once. Rail-RNA eliminates redundant work across samples, making it more efficient as samples are added. For many samples, Rail-RNA is more accurate than annotation-assisted aligners. We use Rail-RNA to align 667 RNA-seq samples from the GEUVADIS project on Amazon Web Services in under 16 hours for US$0.91 per sample. Rail-RNA produces alignments and base-resolution bigWig coverage files, ready for use with downstream packages for reproducible statistical analysis. We identify expressed regions in the GEUVADIS samples and show that both annotated and unannotated (novel) expressed regions exhibit consistent patterns of variation across populations and with respect to known confounders. Rail-RNA is open-source software available at http://rail.bio.


2019 ◽  
Author(s):  
Adam H. Freedman ◽  
John M. Gaspar ◽  
Timothy B. Sackton

ABSTRACTBackgroundTypical experimental design advice for expression analyses using RNA-seq generally assumes that single-end reads provide robust gene-level expression estimates in a cost-effective manner, and that the additional benefits obtained from paired-end sequencing are not worth the additional cost. However, in many cases (e.g., with Illumina NextSeq and NovaSeq instruments), shorter paired-end reads and longer single-end reads can be generated for the same cost, and it is not obvious which strategy should be preferred. Using publicly available data, we test whether short-paired end reads can achieve more robust expression estimates and differential expression results than single-end reads of approximately the same total number of sequenced bases.ResultsAt both the transcript and gene levels, 2×40 paired-end reads unequivocally provide expression estimates that are more highly correlated with 2×125 than 1×75 reads; in nearly all cases, those correlations are also greater than for 1×125, despite the greater total number of sequenced bases for the latter. Across an array of metrics, differential expression tests based upon 2×40 consistently outperform those using 1×75.ConclusionResearchers seeking a cost-effective approach for gene-level expression analysis should prefer short paired-end reads over a longer single-end strategy. Short paired-end reads will also give reasonably robust expression estimates and differential expression results at the isoform level.


2018 ◽  
Author(s):  
Fatemeh Gholizadeh ◽  
Zahra Salehi ◽  
Ali Mohammad banaei-Moghaddam ◽  
Abbas Rahimi Foroushani ◽  
Kaveh kavousi

AbstractWith the advent of the Next Generation Sequencing technologies, RNA-seq has become known as an optimal approach for studying gene expression profiling. Particularly, time course RNA-seq differential expression analysis has been used in many studies to identify candidate genes. However, applying a statistical method to efficiently identify differentially expressed genes (DEGs) in time course studies is challenging due to inherent characteristics of such data including correlation and dependencies over time. Here we aim to relatively compare EBSeq-HMM, a Hidden Markov-based model, with multiDE, a Log-Linear-based model, in a real time course RNA sequencing data. In order to conduct the comparison, common DEGs detected by edgeR, DESeq2 and Voom (referred to as Benchmark DEGs) were utilized as a measure. Each of the two models were compared using different normalization methods. The findings revealed that multiDE identified more Benchmark DEGs and showed a higher agreement with them than EBSeq-HMM. Furthermore, multiDE and EBSeq-HMM displayed their best performance using TMM and Upper-Quartile normalization methods, respectively.


2017 ◽  
Vol 35 (15_suppl) ◽  
pp. e15546-e15546
Author(s):  
Lin Yang ◽  
Wenjing Zheng ◽  
Zheng Wang ◽  
Peikun Ding ◽  
Lijuan Ling ◽  
...  

e15546 Background: Dissecting tumor heterogeneity is crucial for understanding tumor prognosis, response to therapy, and metastasis. But current tissue biopsy-based strategies for characterizing molecular heterogeneity are invasive and may be confounded by intra-tumor heterogeneity. Here we explore whether exosomes that contain bioactive molecules from the cell of origin can provide new noninvasive means to delineate the heterogeneity of human cancers. Methods: We used RNA-sequencing (RNA-seq) to perform unbiased profiling of mRNAs and long noncoding RNAs (lncRNAs) in plasma exosomes isolated from patients with esophageal squamous cell carcinoma (ESCC, n = 6), patients with esophagitis (n = 6), and healthy controls (n = 6). Results: The number of expressed genes detected in our data set is 63355, including 29615 lncRNAs. We found that exosomes from ESCC have dramatically distinct transcriptome and lncRNA landscapes from that of esophagitis and healthy controls, with 2278 genes and 584 lncRNAs showing differential expression between ESCC and controls; and 854 genes and 126 lncRNAs displaying differential expression between ESCC and esophagitis. We also observed variable expression of diverse transcriptional patterns related to immune response, signal transduction, cell mobility, and transmembrane protein binding, as well as differentially expressed 953 lncRNAs between Stage I and Stage II ESCC samples. Finally, we discovered that both gene and lncRNA expression profiles are variably across exosomal samples from different ESCC patients. Conclusions: Our data reveals that exosomes from ESCC contain distinct transcriptional and lncRNA profiles that separate ESCC from benign esophagitis and healthy controls. Our analysis also identifies unappreciated molecular heterogeneity in exosomes of ESCC, which may pave the way for using exosomal RNA-seq to decode molecular heterogeneity in cancers.


2014 ◽  
pp. 249-279
Author(s):  
José Robles ◽  
Sumaira Qureshi ◽  
Stuart Stephen ◽  
Susan Wilson ◽  
Conrad Burden ◽  
...  

Author(s):  
Dionysios Fanidis ◽  
Panagiotis Moulos

Abstract The study of differential gene expression patterns through RNA-Seq comprises a routine task in the daily lives of molecular bioscientists, who produce vast amounts of data requiring proper management and analysis. Despite widespread use, there are still no widely accepted golden standards for the normalization and statistical analysis of RNA-Seq data, and critical biases, such as gene lengths and problems in the detection of certain types of molecules, remain largely unaddressed. Stimulated by these unmet needs and the lack of in-depth research into the potential of combinatorial methods to enhance the analysis of differential gene expression, we had previously introduced the PANDORA P-value combination algorithm while presenting evidence for PANDORA’s superior performance in optimizing the tradeoff between precision and sensitivity. In this article, we present the next generation of the algorithm along with a more in-depth investigation of its capabilities to effectively analyze RNA-Seq data. In particular, we show that PANDORA-reported lists of differentially expressed genes are unaffected by biases introduced by different normalization methods, while, at the same time, they comprise a reliable input option for downstream pathway analysis. Additionally, PANDORA outperforms other methods in detecting differential expression patterns in certain transcript types, including long non-coding RNAs.


2019 ◽  
Vol 10 (1) ◽  
Author(s):  
Beate Vieth ◽  
Swati Parekh ◽  
Christoph Ziegenhain ◽  
Wolfgang Enard ◽  
Ines Hellmann

Abstract The recent rapid spread of single cell RNA sequencing (scRNA-seq) methods has created a large variety of experimental and computational pipelines for which best practices have not yet been established. Here, we use simulations based on five scRNA-seq library protocols in combination with nine realistic differential expression (DE) setups to systematically evaluate three mapping, four imputation, seven normalisation and four differential expression testing approaches resulting in ~3000 pipelines, allowing us to also assess interactions among pipeline steps. We find that choices of normalisation and library preparation protocols have the biggest impact on scRNA-seq analyses. Specifically, we find that library preparation determines the ability to detect symmetric expression differences, while normalisation dominates pipeline performance in asymmetric DE-setups. Finally, we illustrate the importance of informed choices by showing that a good scRNA-seq pipeline can have the same impact on detecting a biological signal as quadrupling the sample size.


2018 ◽  
Vol 12 (1) ◽  
pp. 41-52 ◽  
Author(s):  
Bradford W. Lee ◽  
Virender B. Kumar ◽  
Pooja Biswas ◽  
Audrey C. Ko ◽  
Ramzi M. Alameddine ◽  
...  

Objective: This study utilized Next Generation Sequencing (NGS) to identify differentially expressed transcripts in orbital adipose tissue from patients with active Thyroid Eye Disease (TED) versus healthy controls. Method: This prospective, case-control study enrolled three patients with severe, active thyroid eye disease undergoing orbital decompression, and three healthy controls undergoing routine eyelid surgery with removal of orbital fat. RNA Sequencing (RNA-Seq) was performed on freshly obtained orbital adipose tissue from study patients to analyze the transcriptome. Bioinformatics analysis was performed to determine pathways and processes enriched for the differential expression profile. Quantitative Reverse Transcriptase-Polymerase Chain Reaction (qRT-PCR) was performed to validate the differential expression of selected genes identified by RNA-Seq. Results: RNA-Seq identified 328 differentially expressed genes associated with active thyroid eye disease, many of which were responsible for mediating inflammation, cytokine signaling, adipogenesis, IGF-1 signaling, and glycosaminoglycan binding. The IL-5 and chemokine signaling pathways were highly enriched, and very-low-density-lipoprotein receptor activity and statin medications were implicated as having a potential role in TED. Conclusion: This study is the first to use RNA-Seq technology to elucidate differential gene expression associated with active, severe TED. This study suggests a transcriptional basis for the role of statins in modulating differentially expressed genes that mediate the pathogenesis of thyroid eye disease. Furthermore, the identification of genes with altered levels of expression in active, severe TED may inform the molecular pathways central to this clinical phenotype and guide the development of novel therapeutic agents.


2020 ◽  
Author(s):  
Benjamin Kellman ◽  
Hratch Baghdassarian ◽  
Tiziano Pramparo ◽  
Isaac Shamie ◽  
Vahid Gazestani ◽  
...  

Abstract Background: Both RNA-Seq and sample freeze-thaw are ubiquitous. However, knowledge about the impact of freeze-thaw on downstream analyses is limited. The lack of common quality metrics that are sufficiently sensitive to freeze-thaw and RNA degradation, e.g. the RNA Integrity Score, makes such assessments challenging.Results: Here we quantify the impact of repeated freeze-thaw cycles on the reliability of RNA-Seq by examining poly(A)-enriched and ribosomal RNA depleted RNA-seq from frozen leukocytes drawn from a toddler Autism cohort. To do so, we estimate the relative noise, or percentage of random counts, separating technical replicates. Using this approach we measured noise associated with RIN and freeze-thaw cycles. As expected, RIN does not fully capture sample degradation due to freeze-thaw. We further examined differential expression results and found that three freeze-thaws should extinguish the differential expression reproducibility of similar experiments. Freeze-thaw also resulted in a 3’ shift in the read coverage distribution along the gene body of poly(A)-enriched samples compared to ribosomal RNA depleted samples, suggesting that library preparation may exacerbate freeze-thaw-induced sample degradation.Conclusion: The use of poly(A)-enrichment for RNA sequencing is pervasive in library preparation of frozen tissue, and thus, it is important during experimental design and data analysis to consider the impact of repeated freeze-thaw cycles on reproducibility.


Sign in / Sign up

Export Citation Format

Share Document