scholarly journals QuickRNASeq: Guide for Pipeline Implementation and for Interactive Results Visualization

2017 ◽  
Author(s):  
Wen He ◽  
Shanrong Zhao ◽  
Chi Zhang ◽  
Michael S. Vincent ◽  
Baohong Zhang

i.Summary/AbstractSequencing of transcribed RNA molecules (RNA-seq) has been used wildly for studying cell transcriptomes in bulk or at the single-cell level (1, 2, 3) and is becoming the de facto technology for investigating gene expression level changes in various biological conditions, on the time course, and under drug treatments. Furthermore, RNA-Seq data helped identify fusion genes that are related to certain cancers (4). Differential gene expression before and after drug treatments provides insights to mechanism of action, pharmacodynamics of the drugs, and safety concerns (5). Because each RNA-seq run generates tens to hundreds of millions of short reads with size ranging from 50bp-200bp, a tool that deciphers these short reads to an integrated and digestible analysis report is in high demand. QuickRNASeq (6) is an application for large-scale RNA-seq data analysis and real-time interactive visualization of complex data sets. This application automates the use of several of the best open-source tools to efficiently generate user friendly, easy to share, and ready to publish report. Figure 1 illustrates some of the interactive plots produced by QuickRNASeq. The visualization features of the application have been further improved since its first publication in early 2016. The original QuickRNASeq publication (6) provided details of background, software selection, and implementation. Here, we outline the steps required to implement QuickRNASeq in user’s own environment, as well as demonstrate some basic yet powerful utilities of the advanced interactive visualization modules in the report.

2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Verônica R. de Melo Costa ◽  
Julianus Pfeuffer ◽  
Annita Louloupi ◽  
Ulf A. V. Ørom ◽  
Rosario M. Piro

Abstract Background Introns are generally removed from primary transcripts to form mature RNA molecules in a post-transcriptional process called splicing. An efficient splicing of primary transcripts is an essential step in gene expression and its misregulation is related to numerous human diseases. Thus, to better understand the dynamics of this process and the perturbations that might be caused by aberrant transcript processing it is important to quantify splicing efficiency. Results Here, we introduce SPLICE-q, a fast and user-friendly Python tool for genome-wide SPLICing Efficiency quantification. It supports studies focusing on the implications of splicing efficiency in transcript processing dynamics. SPLICE-q uses aligned reads from strand-specific RNA-seq to quantify splicing efficiency for each intron individually and allows the user to select different levels of restrictiveness concerning the introns’ overlap with other genomic elements such as exons of other genes. We applied SPLICE-q to globally assess the dynamics of intron excision in yeast and human nascent RNA-seq. We also show its application using total RNA-seq from a patient-matched prostate cancer sample. Conclusions Our analyses illustrate that SPLICE-q is suitable to detect a progressive increase of splicing efficiency throughout a time course of nascent RNA-seq and it might be useful when it comes to understanding cancer progression beyond mere gene expression levels. SPLICE-q is available at: https://github.com/vrmelo/SPLICE-q


GigaScience ◽  
2020 ◽  
Vol 9 (11) ◽  
Author(s):  
Alexandra J Lee ◽  
YoSon Park ◽  
Georgia Doing ◽  
Deborah A Hogan ◽  
Casey S Greene

Abstract Motivation In the past two decades, scientists in different laboratories have assayed gene expression from millions of samples. These experiments can be combined into compendia and analyzed collectively to extract novel biological patterns. Technical variability, or "batch effects," may result from combining samples collected and processed at different times and in different settings. Such variability may distort our ability to extract true underlying biological patterns. As more integrative analysis methods arise and data collections get bigger, we must determine how technical variability affects our ability to detect desired patterns when many experiments are combined. Objective We sought to determine the extent to which an underlying signal was masked by technical variability by simulating compendia comprising data aggregated across multiple experiments. Method We developed a generative multi-layer neural network to simulate compendia of gene expression experiments from large-scale microbial and human datasets. We compared simulated compendia before and after introducing varying numbers of sources of undesired variability. Results The signal from a baseline compendium was obscured when the number of added sources of variability was small. Applying statistical correction methods rescued the underlying signal in these cases. However, as the number of sources of variability increased, it became easier to detect the original signal even without correction. In fact, statistical correction reduced our power to detect the underlying signal. Conclusion When combining a modest number of experiments, it is best to correct for experiment-specific noise. However, when many experiments are combined, statistical correction reduces our ability to extract underlying patterns.


2020 ◽  
Author(s):  
Ramon Viñas ◽  
Tiago Azevedo ◽  
Eric R. Gamazon ◽  
Pietro Liò

AbstractA question of fundamental biological significance is to what extent the expression of a subset of genes can be used to recover the full transcriptome, with important implications for biological discovery and clinical application. To address this challenge, we present GAIN-GTEx, a method for gene expression imputation based on Generative Adversarial Imputation Networks. In order to increase the applicability of our approach, we leverage data from GTEx v8, a reference resource that has generated a comprehensive collection of transcriptomes from a diverse set of human tissues. We compare our model to several standard and state-of-the-art imputation methods and show that GAIN-GTEx is significantly superior in terms of predictive performance and runtime. Furthermore, our results indicate strong generalisation on RNA-Seq data from 3 cancer types across varying levels of missingness. Our work can facilitate a cost-effective integration of large-scale RNA biorepositories into genomic studies of disease, with high applicability across diverse tissue types.


2021 ◽  
Author(s):  
Dennis A Sun ◽  
Nipam H Patel

AbstractEmerging research organisms enable the study of biology that cannot be addressed using classical “model” organisms. The development of novel data resources can accelerate research in such animals. Here, we present new functional genomic resources for the amphipod crustacean Parhyale hawaiensis, facilitating the exploration of gene regulatory evolution using this emerging research organism. We use Omni-ATAC-Seq, an improved form of the Assay for Transposase-Accessible Chromatin coupled with next-generation sequencing (ATAC-Seq), to identify accessible chromatin genome-wide across a broad time course of Parhyale embryonic development. This time course encompasses many major morphological events, including segmentation, body regionalization, gut morphogenesis, and limb development. In addition, we use short- and long-read RNA-Seq to generate an improved Parhyale genome annotation, enabling deeper classification of identified regulatory elements. We leverage a variety of bioinformatic tools to discover differential accessibility, predict nucleosome positioning, infer transcription factor binding, cluster peaks based on accessibility dynamics, classify biological functions, and correlate gene expression with accessibility. Using a Minos transposase reporter system, we demonstrate the potential to identify novel regulatory elements using this approach, including distal regulatory elements. This work provides a platform for the identification of novel developmental regulatory elements in Parhyale, and offers a framework for performing such experiments in other emerging research organisms.Primary Findings-Omni-ATAC-Seq identifies cis-regulatory elements genome-wide during crustacean embryogenesis-Combined short- and long-read RNA-Seq improves the Parhyale genome annotation-ImpulseDE2 analysis identifies dynamically regulated candidate regulatory elements-NucleoATAC and HINT-ATAC enable inference of nucleosome occupancy and transcription factor binding-Fuzzy clustering reveals peaks with distinct accessibility and chromatin dynamics-Integration of accessibility and gene expression reveals possible enhancers and repressors-Omni-ATAC can identify known and novel regulatory elements


PeerJ ◽  
2021 ◽  
Vol 9 ◽  
pp. e11875
Author(s):  
Tomoko Matsuda

Large volumes of high-throughput sequencing data have been submitted to the Sequencing Read Archive (SRA). The lack of experimental metadata associated with the data makes reuse and understanding data quality very difficult. In the case of RNA sequencing (RNA-Seq), which reveals the presence and quantity of RNA in a biological sample at any moment, it is necessary to consider that gene expression responds over a short time interval (several seconds to a few minutes) in many organisms. Therefore, to isolate RNA that accurately reflects the transcriptome at the point of harvest, raw biological samples should be processed by freezing in liquid nitrogen, immersing in RNA stabilization reagent or lysing and homogenizing in RNA lysis buffer containing guanidine thiocyanate as soon as possible. As the number of samples handled simultaneously increases, the time until the RNA is protected can increase. Here, to evaluate the effect of different lag times in RNA protection on RNA-Seq data, we harvested CHO-S cells after 3, 5, 6, and 7 days of cultivation, added RNA lysis buffer in a time course of 15, 30, 45, and 60 min after harvest, and conducted RNA-Seq. These RNA samples showed high RNA integrity number (RIN) values indicating non-degraded RNA, and sequence data from libraries prepared with these RNA samples was of high quality according to FastQC. We observed that, at the same cultivation day, global trends of gene expression were similar across the time course of addition of RNA lysis buffer; however, the expression of some genes was significantly different between the time-course samples of the same cultivation day; most of these differentially expressed genes were related to apoptosis. We conclude that the time lag between sample harvest and RNA protection influences gene expression of specific genes. It is, therefore, necessary to know not only RIN values of RNA and the quality of the sequence data but also how the experiment was performed when acquiring RNA-Seq data from the database.


Genes ◽  
2020 ◽  
Vol 11 (12) ◽  
pp. 1487
Author(s):  
Marie Lataretu ◽  
Martin Hölzer

RNA-Seq enables the identification and quantification of RNA molecules, often with the aim of detecting differentially expressed genes (DEGs). Although RNA-Seq evolved into a standard technique, there is no universal gold standard for these data’s computational analysis. On top of that, previous studies proved the irreproducibility of RNA-Seq studies. Here, we present a portable, scalable, and parallelizable Nextflow RNA-Seq pipeline to detect DEGs, which assures a high level of reproducibility. The pipeline automatically takes care of common pitfalls, such as ribosomal RNA removal and low abundance gene filtering. Apart from various visualizations for the DEG results, we incorporated downstream pathway analysis for common species as Homo sapiens and Mus musculus. We evaluated the DEG detection functionality while using qRT-PCR data serving as a reference and observed a very high correlation of the logarithmized gene expression fold changes.


Blood ◽  
2018 ◽  
Vol 132 (Supplement 1) ◽  
pp. 2404-2404
Author(s):  
Shouguo Gao ◽  
Zhijie Wu ◽  
Carrie Diamond ◽  
Bradley Arnold ◽  
Valentina Giudice ◽  
...  

Abstract Introduction . T-cell large granular lymphocytosis (T-LGL) is a low grade lymphoproliferative disorder, often clinically manifest as bone marrow failure. Treatment with immunosuppressive therapies is effective, but the dominant clone may persist even in responding patients. The pathogenesis of T-LGL has not been fully elucidated. In this study, we performed single cell RNA sequencing (sc-RNA seq) and V(D)J profiling to discern clonotypes and gene expression patterns of T lymphocytes from T-LGL patients who were sampled before and after treatment. Methods. Blood was obtained from patients participating in a phase 2 protocol of alemtuzumab as second line therapy (NCT00345345; Dumitriu B et al, Lancet Haematol 2016). Leukapheresis was performed in 13 patients (M/F 7/6; median age 51 years, range 26-85) before and after 3-6 months alemtuzumab administration and in 7 age-matched healthy donors. Cryopreserved blood was enriched for T cells with the EasySep Human T cell Isolation Kit (Stem cell). sc-RNA seq was performed on the 10XGenomics Chromium Single Cell V(D)J + 5' Gene Expression platform, and sequencing obtained on the HiSeq3000 Platform. Barcode assignment, alignment, unique molecular index counting and T cell receptor sequence assembly were performed using Cell Ranger 2.1.1. Results. Four hundred fifty thousand cells from 13 patients and 107,000 cells from 7 healthy donors were profiled. We measured productive TCR chains (which fully span the V and J regions, with a recognizable start codon in the V region and lacking a stop codon in the V-J region, thus potentially generating a protein). We detected at least one productive TCR α-chain in 50%, one productive TCR β-chain in 69% and paired productive αβ-chains in 47% of all cells. There was loss of TCR repertoire diversity in patients which was quantified by Simpson's diversity index; most patients showed oligoclonal or, less frequently, monoclonal expansion of the TCR repertoire (Fig. A). Regardless of clinical response, alemtuzumab treatment did not correct the low TCR repertoire diversity. TCR repertoires can be classified as "public", when they express identical TCR sequences across multiple individuals, or "private", when each individual displays distinct TCR clonotypes. No TCRA or TCRB CDR3 homology among patients was observed: most TCR clonotypes appeared to be private. Our data suggests that T-LGL is etiologically heterogenous disease, consistent with T cell expansion in response to a variety antigens, in diverse HLA contexts, or randomly. Despite differences of TCR among patients and healthy donors, and the presence of large clones in patients, distribution of TCR diversity followed the power law distribution in healthy donors and patients (Fig. B, showing the negative linear relationship between logarithmic expression of clone frequency and clone size). The observed distribution is consistent with a somatic evolution model, in which cell fitness depends on cellular receptor response to specific antigens and stimulation of cells by cytokine and other signals from the environment; fitted clones have higher birth-death ratios and thus expand (Desponds J et al, PNAS 2016). CD4 and CD8 T cells can be virtually separated by imputation from their transcriptomes (Fig. C). Comparison of gene expression between patients and healthy donors showed dysregulation of genes involved in pathways related to the immune response and cell apoptosis, consistent with a pathophysiology of T cell clonal expansion. We used diffusion mapping, which localizes datapoints to their eigen components in low-dimesional space, to characterize sources contributing to the gene expression phenotype: the first component was mainly from T cell activation and the second was associated with TCR expression. In LGL the T cell transcriptome appeared to be shaped by both lineage development and TCR rearrangement. Conclusion. We describe at the single cell level T clonal expansion profiles in T-LGL, pre- and post-treatment. Single cell analysis allows accurate recovery of paired α and β chains in the same cell and demonstrates a continuum of cell lineage differentiation. We found a range of differences in transcriptome and TCR repertoires across patients. Transcriptome data, coupled with detailed TCR-based lineage information, provides a rich resource for understanding of the pathology of T-LGL and has implications for prognosis, treatment, and monitoring in the clinic. Figure. Figure. Disclosures Young: GlaxoSmithKline: Research Funding; CRADA with Novartis: Research Funding; National Institute of Health: Research Funding.


2018 ◽  
Author(s):  
Koen Van Den Berge ◽  
Katharina Hembach ◽  
Charlotte Soneson ◽  
Simone Tiberi ◽  
Lieven Clement ◽  
...  

Gene expression is the fundamental level at which the result of various genetic and regulatory programs are observable. The measurement of transcriptome-wide gene expression has convincingly switched from microarrays to sequencing in a matter of years. RNA sequencing (RNA-seq) provides a quantitative and open system for profiling transcriptional outcomes on a large scale and therefore facilitates a large diversity of applications, including basic science studies, but also agricultural or clinical situations. In the past 10 years or so, much has been learned about the characteristics of the RNA-seq datasets as well as the performance of the myriad of methods developed. In this review, we give an overall view of the developments in RNA-seq data analysis, including experimental design, with an explicit focus on quantification of gene expression and statistical approaches for differential expression. We also highlight emerging data types, such as single-cell RNA-seq and gene expression profiling using long-read technologies.


2004 ◽  
Vol 19 (3) ◽  
pp. 292-302 ◽  
Author(s):  
Joshua M. Spin ◽  
Shriram Nallamshetty ◽  
Raymond Tabibiazar ◽  
Euan A. Ashley ◽  
Jennifer Y. King ◽  
...  

Mesodermal and epidermal precursor cells undergo phenotypic changes during differentiation to the smooth muscle cell (SMC) lineage that are relevant to pathophysiological processes in the adult. Molecular mechanisms that underlie lineage determination and terminal differentiation of this cell type have received much attention, but the genetic program that regulates these processes has not been fully defined. Study of SMC differentiation has been facilitated by development of the P19-derived A404 embryonal cell line, which differentiates toward this lineage in the presence of retinoic acid and allows selection for cells adopting a SMC fate through a differentiation-specific drug marker. We sought to define global alterations in gene expression by studying A404 cells during SMC differentiation with oligonucleotide microarray transcriptional profiling. Using an in situ 60-mer array platform with more than 20,000 mouse genes derived from the National Institute on Aging clone set, we identified 2,739 genes that were significantly upregulated after differentiation was completed (false-detection ratio <1). These genes encode numerous markers known to characterize differentiated SMC, as well as many unknown factors. We further characterized the sequential patterns of gene expression during the differentiation time course, particularly for known transcription factor families, providing new insights into the regulation of the differentiation process. Changes in genes associated with specific biological ontology-based pathways were evaluated, and temporal trends were identified for functional pathways. In addition to confirming the utility of the A404 model, our data provide a large-scale perspective of gene regulation during SMC differentiation.


Sign in / Sign up

Export Citation Format

Share Document