scholarly journals Sashimi plots: Quantitative visualization of alternative isoform expression from RNA-seq data

2014 ◽  
Author(s):  
Yarden Katz ◽  
Eric T Wang ◽  
Jacob Stilterra ◽  
Schraga Schwartz ◽  
Bang Wong ◽  
...  

Analysis of RNA sequencing (RNA-Seq) data revealed that the vast majority of human genes express multiple mRNA isoforms, produced by alternative pre-mRNA splicing and other mechanisms, and that most alternative isoforms vary in expression between human tissues. As RNA-Seq datasets grow in size, it remains challenging to visualize isoform expression across multiple samples. We present Sashimi plots, a quantitative multi-sample visualization of RNA-Seq reads aligned to gene annotations, which enables quantitative comparison of isoform usage across samples or experimental conditions. Given an input annotation and spliced alignments of reads from a sample, a region of interest is visualized in a Sashimi plot as follows: (i) alignments in exons are represented as read densities (optionally normalized by length of genomic region and coverage), and (ii) splice junction reads are drawn as arcs connecting a pair of exons, where arc width is drawn proportional to the number of reads aligning to the junction.

2018 ◽  
Author(s):  
Charlotte Soneson ◽  
Michael I Love ◽  
Rob Patro ◽  
Shobbir Hussain ◽  
Dheeraj Malhotra ◽  
...  

AbstractMost methods for statistical analysis of RNA-seq data take a matrix of abundance estimates for some type of genomic features as their input, and consequently the quality of any obtained results are directly dependent on the quality of these abundances. Here, we present the junction coverage compatibility (JCC) score, which provides a way to evaluate the reliability of transcript-level abundance estimates as well as the accuracy of transcript annotation catalogs. It works by comparing the observed number of reads spanning each annotated splice junction in a genomic region to the predicted number of junction-spanning reads, inferred from the estimated transcript abundances and the genomic coordinates of the corresponding annotated transcripts. We show that while most genes show good agreement between the observed and predicted junction coverages, there is a small set of genes that do not. Genes with poor agreement are found regardless of the method used to estimate transcript abundances, and the corresponding transcript abundances should be treated with care in any downstream analyses.


2021 ◽  
Author(s):  
Wenpin Hou ◽  
Zhicheng Ji ◽  
Zeyu Chen ◽  
E John Wherry ◽  
Stephanie C Hicks ◽  
...  

Pseudotime analysis with single-cell RNA-sequencing (scRNA-seq) data has been widely used to study dynamic gene regulatory programs along continuous biological processes. While many computational methods have been developed to infer the pseudo-temporal trajectories of cells within a biological sample, methods that compare pseudo-temporal patterns with multiple samples (or replicates) across different experimental conditions are lacking. Lamian is a comprehensive and statistically-rigorous computational framework for differential multi-sample pseudotime analysis. It can be used to identify changes in a biological process associated with sample covariates, such as different biological conditions, and also to detect changes in gene expression, cell density, and topology of a pseudotemporal trajectory. Unlike existing methods that ignore sample variability, Lamian draws statistical inference after accounting for cross-sample variability and hence substantially reduces sample-specific false discoveries that are not generalizable to new samples. Using both simulations and real scRNA-seq data, including an analysis of differential immune response programs between COVID-19 patients with different disease severity levels, we demonstrate the advantages of Lamian in decoding cellular gene expression programs in continuous biological processes.


2019 ◽  
Vol 2 (1) ◽  
pp. e201800175 ◽  
Author(s):  
Charlotte Soneson ◽  
Michael I Love ◽  
Rob Patro ◽  
Shobbir Hussain ◽  
Dheeraj Malhotra ◽  
...  

Most methods for statistical analysis of RNA-seq data take a matrix of abundance estimates for some type of genomic features as their input, and consequently the quality of any obtained results is directly dependent on the quality of these abundances. Here, we present the junction coverage compatibility score, which provides a way to evaluate the reliability of transcript-level abundance estimates and the accuracy of transcript annotation catalogs. It works by comparing the observed number of reads spanning each annotated splice junction in a genomic region to the predicted number of junction-spanning reads, inferred from the estimated transcript abundances and the genomic coordinates of the corresponding annotated transcripts. We show that although most genes show good agreement between the observed and predicted junction coverages, there is a small set of genes that do not. Genes with poor agreement are found regardless of the method used to estimate transcript abundances, and the corresponding transcript abundances should be treated with care in any downstream analyses.


2019 ◽  
Vol 5 (5) ◽  
pp. eaav2249 ◽  
Author(s):  
Dongju Shin ◽  
Wookjae Lee ◽  
Ji Hyun Lee ◽  
Duhee Bang

The development of high-throughput single-cell RNA sequencing (scRNA-seq) has enabled access to information about gene expression in individual cells and insights into new biological areas. Although the interest in scRNA-seq has rapidly grown in recent years, the existing methods are plagued by many challenges when performing scRNA-seq on multiple samples. To simultaneously analyze multiple samples with scRNA-seq, we developed a universal sample barcoding method through transient transfection with short barcode oligonucleotides. By conducting a species-mixing experiment, we have validated the accuracy of our method and confirmed the ability to identify multiplets and negatives. Samples from a 48-plex drug treatment experiment were pooled and analyzed by a single run of Drop-Seq. This revealed unique transcriptome responses for each drug and target-specific gene expression signatures at the single-cell level. Our cost-effective method is widely applicable for the single-cell profiling of multiple experimental conditions, enabling the widespread adoption of scRNA-seq for various applications.


2018 ◽  
Author(s):  
Dongju Shin ◽  
Wookjae Lee ◽  
Ji Hyun Lee ◽  
Duhee Bang

AbstractTo simultaneously analyze multiple samples of various conditions with scRNA-seq, we developed a universal sample barcoding method through transient transfection of SBOs. A 48-plex drug treatment experiment of pooled samples analyzed by a single run of Drop-Seq revealed a unique transcriptome response for each drug and target-specific gene expression signatures at the single-cell level. Our cost-effective method is widely applicable for single-cell profiling of multiple experimental conditions.


2021 ◽  
Author(s):  
Mervin M Fansler ◽  
Gang Zhen ◽  
Christine Mayr

Although half of human genes use alternative polyadenylation (APA) to generate mRNA isoforms that encode the same protein but differ in their 3′UTRs, most single cell RNA-sequencing (scRNA-seq) pipelines only measure gene expression. Here, we describe an open-access pipeline, called scUTRquant (https://github.com/Mayrlab/scUTRquant), that measures gene and 3′UTR isoform expression from scRNA-seq data obtained from known cell types in any species. scUTRquant-derived gene and 3′UTR transcript counts were validated against standard methods which demonstrated their accuracy. 3′UTR isoform quantification was substantially more reproducible than previous methods. scUTRquant provides an atlas of high-confidence 3′ end cleavage sites at single-nucleotide resolution to allow APA comparison across mouse datasets. Analysis of 120 mouse cell types revealed that during differentiation genes either change their expression or they change their 3′UTR isoform usage. Therefore, we identified thousands of genes with 3′UTR isoform changes that have previously not been implicated in specific biological processes.


Genetics ◽  
2002 ◽  
Vol 161 (3) ◽  
pp. 1333-1337
Author(s):  
Thomas I Milac ◽  
Frederick R Adler ◽  
Gerald R Smith

Abstract We have determined the marker separations (genetic distances) that maximize the probability, or power, of detecting meiotic recombination deficiency when only a limited number of meiotic progeny can be assayed. We find that the optimal marker separation is as large as 30–100 cM in many cases. Provided the appropriate marker separation is used, small reductions in recombination potential (as little as 50%) can be detected by assaying a single interval in as few as 100 progeny. If recombination is uniformly altered across the genomic region of interest, the same sensitivity can be obtained by assaying multiple independent intervals in correspondingly fewer progeny. A reduction or abolition of crossover interference, with or without a reduction of recombination proficiency, can be detected with similar sensitivity. We present a set of graphs that display the optimal marker separation and the number of meiotic progeny that must be assayed to detect a given recombination deficiency in the presence of various levels of crossover interference. These results will aid the optimal design of experiments to detect meiotic recombination deficiency in any organism.


Author(s):  
Gangjun Zhao ◽  
Caixia Luo ◽  
Jianning Luo ◽  
Junxing Li ◽  
Hao Gong ◽  
...  

Abstract Key message A dwarfism gene LacDWARF1 was mapped by combined BSA-Seq and comparative genomics analyses to a 65.4 kb physical genomic region on chromosome 05. Abstract Dwarf architecture is one of the most important traits utilized in Cucurbitaceae breeding because it saves labor and increases the harvest index. To our knowledge, there has been no prior research about dwarfism in the sponge gourd. This study reports the first dwarf mutant WJ209 with a decrease in cell size and internodes. A genetic analysis revealed that the mutant phenotype was controlled by a single recessive gene, which is designated Lacdwarf1 (Lacd1). Combined with bulked segregate analysis and next-generation sequencing, we quickly mapped a 65.4 kb region on chromosome 5 using F2 segregation population with InDel and SNP polymorphism markers. Gene annotation revealed that Lac05g019500 encodes a gibberellin 3β-hydroxylase (GA3ox) that functions as the most likely candidate gene for Lacd1. DNA sequence analysis showed that there is an approximately 4 kb insertion in the first intron of Lac05g019500 in WJ209. Lac05g019500 is transcribed incorrectly in the dwarf mutant owing to the presence of the insertion. Moreover, the bioactive GAs decreased significantly in WJ209, and the dwarf phenotype could be restored by exogenous GA3 treatment, indicating that WJ209 is a GA-deficient mutant. All these results support the conclusion that Lac05g019500 is the Lacd1 gene. In addition, RNA-Seq revealed that many genes, including those related to plant hormones, cellular process, cell wall, membrane and response to stress, were significantly altered in WJ209 compared with the wild type. This study will aid in the use of molecular marker-assisted breeding in the dwarf sponge gourd.


2019 ◽  
Author(s):  
Stefan Kurtenbach ◽  
J. William Harbour

AbstractWhile there are sophisticated resources available for displaying NGS data, including the Integrative Genomics Viewer (IGV) and the UCSC genome browser, exporting regions and assembling figures for publication remains challenging. In particular, customizing track appearance and overlaying track replicates is a manual and time-consuming process. Here, we present SparK, a tool which auto-generates publication-ready, high-resolution, true vector graphic figures from any NGS-based tracks, including RNA-seq, ChIP-seq, and ATAC-seq. Novel functions of SparK include averaging of replicates, plotting standard deviation tracks, and highlighting significantly changed areas. SparK is written in Python 3, making it executable on any major OS platform. Using command line prompts to generate figures allows later changes to be made very easy. For instance, if the genomic region of the plot needs to be changed, or tracks need to be added or removed, the figure can easily be re-generated within seconds without the manual process of re-exporting and re-assembling everything. After plotting with SparK, changes to the output SVG vector graphic files are simple to make, including text, lines, and colors. SparK is publicly available on GitHub: https://github.com/harbourlab/SparK.


Sign in / Sign up

Export Citation Format

Share Document