Sashimi plots: Quantitative visualization of alternative isoform expression from RNA-seq data

Analysis of RNA sequencing (RNA-Seq) data revealed that the vast majority of human genes express multiple mRNA isoforms, produced by alternative pre-mRNA splicing and other mechanisms, and that most alternative isoforms vary in expression between human tissues. As RNA-Seq datasets grow in size, it remains challenging to visualize isoform expression across multiple samples. We present Sashimi plots, a quantitative multi-sample visualization of RNA-Seq reads aligned to gene annotations, which enables quantitative comparison of isoform usage across samples or experimental conditions. Given an input annotation and spliced alignments of reads from a sample, a region of interest is visualized in a Sashimi plot as follows: (i) alignments in exons are represented as read densities (optionally normalized by length of genomic region and coverage), and (ii) splice junction reads are drawn as arcs connecting a pair of exons, where arc width is drawn proportional to the number of reads aligning to the junction.

Download Full-text

A junction coverage compatibility score to quantify the reliability of transcript abundance estimates and annotation catalogs

10.1101/378539 ◽

2018 ◽

Cited By ~ 3

Author(s):

Charlotte Soneson ◽

Michael I Love ◽

Rob Patro ◽

Shobbir Hussain ◽

Dheeraj Malhotra ◽

...

Keyword(s):

Splice Junction ◽

Transcript Level ◽

Transcript Abundance ◽

Genomic Region ◽

Rna Seq ◽

Poor Agreement ◽

Abundance Estimates ◽

Small Set ◽

Good Agreement

AbstractMost methods for statistical analysis of RNA-seq data take a matrix of abundance estimates for some type of genomic features as their input, and consequently the quality of any obtained results are directly dependent on the quality of these abundances. Here, we present the junction coverage compatibility (JCC) score, which provides a way to evaluate the reliability of transcript-level abundance estimates as well as the accuracy of transcript annotation catalogs. It works by comparing the observed number of reads spanning each annotated splice junction in a genomic region to the predicted number of junction-spanning reads, inferred from the estimated transcript abundances and the genomic coordinates of the corresponding annotated transcripts. We show that while most genes show good agreement between the observed and predicted junction coverages, there is a small set of genes that do not. Genes with poor agreement are found regardless of the method used to estimate transcript abundances, and the corresponding transcript abundances should be treated with care in any downstream analyses.

Download Full-text

A statistical framework for differential pseudotime analysis with multiple single-cell RNA-seq samples

10.1101/2021.07.10.451910 ◽

2021 ◽

Author(s):

Wenpin Hou ◽

Zhicheng Ji ◽

Zeyu Chen ◽

E John Wherry ◽

Stephanie C Hicks ◽

...

Keyword(s):

Gene Expression ◽

Single Cell ◽

Biological Processes ◽

Rna Seq ◽

Experimental Conditions ◽

Computational Framework ◽

Statistical Framework ◽

Gene Regulatory ◽

Multiple Samples ◽

False Discoveries

Pseudotime analysis with single-cell RNA-sequencing (scRNA-seq) data has been widely used to study dynamic gene regulatory programs along continuous biological processes. While many computational methods have been developed to infer the pseudo-temporal trajectories of cells within a biological sample, methods that compare pseudo-temporal patterns with multiple samples (or replicates) across different experimental conditions are lacking. Lamian is a comprehensive and statistically-rigorous computational framework for differential multi-sample pseudotime analysis. It can be used to identify changes in a biological process associated with sample covariates, such as different biological conditions, and also to detect changes in gene expression, cell density, and topology of a pseudotemporal trajectory. Unlike existing methods that ignore sample variability, Lamian draws statistical inference after accounting for cross-sample variability and hence substantially reduces sample-specific false discoveries that are not generalizable to new samples. Using both simulations and real scRNA-seq data, including an analysis of differential immune response programs between COVID-19 patients with different disease severity levels, we demonstrate the advantages of Lamian in decoding cellular gene expression programs in continuous biological processes.

Download Full-text

A junction coverage compatibility score to quantify the reliability of transcript abundance estimates and annotation catalogs

Life Science Alliance ◽

10.26508/lsa.201800175 ◽

2019 ◽

Vol 2 (1) ◽

pp. e201800175 ◽

Cited By ~ 10

Author(s):

Charlotte Soneson ◽

Michael I Love ◽

Rob Patro ◽

Shobbir Hussain ◽

Dheeraj Malhotra ◽

...

Keyword(s):

Splice Junction ◽

Transcript Level ◽

Transcript Abundance ◽

Genomic Region ◽

Rna Seq ◽

Poor Agreement ◽

Abundance Estimates ◽

Small Set ◽

Good Agreement

Most methods for statistical analysis of RNA-seq data take a matrix of abundance estimates for some type of genomic features as their input, and consequently the quality of any obtained results is directly dependent on the quality of these abundances. Here, we present the junction coverage compatibility score, which provides a way to evaluate the reliability of transcript-level abundance estimates and the accuracy of transcript annotation catalogs. It works by comparing the observed number of reads spanning each annotated splice junction in a genomic region to the predicted number of junction-spanning reads, inferred from the estimated transcript abundances and the genomic coordinates of the corresponding annotated transcripts. We show that although most genes show good agreement between the observed and predicted junction coverages, there is a small set of genes that do not. Genes with poor agreement are found regardless of the method used to estimate transcript abundances, and the corresponding transcript abundances should be treated with care in any downstream analyses.

Download Full-text

Multiplexed single-cell RNA-seq via transient barcoding for simultaneous expression profiling of various drug perturbations

Science Advances ◽

10.1126/sciadv.aav2249 ◽

2019 ◽

Vol 5 (5) ◽

pp. eaav2249 ◽

Cited By ~ 22

Author(s):

Dongju Shin ◽

Wookjae Lee ◽

Ji Hyun Lee ◽

Duhee Bang

Keyword(s):

Gene Expression ◽

Single Cell ◽

Cost Effective ◽

Specific Gene ◽

Rna Seq ◽

Experimental Conditions ◽

Cost Effective Method ◽

Treatment Experiment ◽

Single Cell Profiling ◽

Multiple Samples

The development of high-throughput single-cell RNA sequencing (scRNA-seq) has enabled access to information about gene expression in individual cells and insights into new biological areas. Although the interest in scRNA-seq has rapidly grown in recent years, the existing methods are plagued by many challenges when performing scRNA-seq on multiple samples. To simultaneously analyze multiple samples with scRNA-seq, we developed a universal sample barcoding method through transient transfection with short barcode oligonucleotides. By conducting a species-mixing experiment, we have validated the accuracy of our method and confirmed the ability to identify multiplets and negatives. Samples from a 48-plex drug treatment experiment were pooled and analyzed by a single run of Drop-Seq. This revealed unique transcriptome responses for each drug and target-specific gene expression signatures at the single-cell level. Our cost-effective method is widely applicable for the single-cell profiling of multiple experimental conditions, enabling the widespread adoption of scRNA-seq for various applications.

Download Full-text

“Pocket-sized RNA-Seq”: A Method to Capture New Mature microRNA Produced from a Genomic Region of Interest

Non-Coding RNA ◽

10.3390/ncrna1020127 ◽

2015 ◽

Vol 1 (2) ◽

pp. 127-138 ◽

Cited By ~ 2

Author(s):

Florent Hubé ◽

Claire Francastel

Keyword(s):

Region Of Interest ◽

Genomic Region ◽

Rna Seq ◽

Mature Microrna

Download Full-text

Multiplexed single-cell RNA-seq via transient barcoding for drug screening

10.1101/359851 ◽

2018 ◽

Cited By ~ 4

Author(s):

Dongju Shin ◽

Wookjae Lee ◽

Ji Hyun Lee ◽

Duhee Bang

Keyword(s):

Single Cell ◽

Cost Effective ◽

Specific Gene ◽

Rna Seq ◽

Experimental Conditions ◽

Cost Effective Method ◽

Treatment Experiment ◽

Single Cell Profiling ◽

Multiple Samples ◽

Pooled Samples

AbstractTo simultaneously analyze multiple samples of various conditions with scRNA-seq, we developed a universal sample barcoding method through transient transfection of SBOs. A 48-plex drug treatment experiment of pooled samples analyzed by a single run of Drop-Seq revealed a unique transcriptome response for each drug and target-specific gene expression signatures at the single-cell level. Our cost-effective method is widely applicable for single-cell profiling of multiple experimental conditions.

Download Full-text

Quantification of alternative 3′UTR isoforms from single cell RNA-seq data with scUTRquant

10.1101/2021.11.22.469635 ◽

2021 ◽

Author(s):

Mervin M Fansler ◽

Gang Zhen ◽

Christine Mayr

Keyword(s):

Single Cell ◽

Alternative Polyadenylation ◽

Cell Types ◽

Mouse Cell ◽

Cleavage Sites ◽

Rna Seq ◽

Mrna Isoforms ◽

Single Nucleotide ◽

Human Genes ◽

Nucleotide Resolution

Although half of human genes use alternative polyadenylation (APA) to generate mRNA isoforms that encode the same protein but differ in their 3′UTRs, most single cell RNA-sequencing (scRNA-seq) pipelines only measure gene expression. Here, we describe an open-access pipeline, called scUTRquant (https://github.com/Mayrlab/scUTRquant), that measures gene and 3′UTR isoform expression from scRNA-seq data obtained from known cell types in any species. scUTRquant-derived gene and 3′UTR transcript counts were validated against standard methods which demonstrated their accuracy. 3′UTR isoform quantification was substantially more reproducible than previous methods. scUTRquant provides an atlas of high-confidence 3′ end cleavage sites at single-nucleotide resolution to allow APA comparison across mouse datasets. Analysis of 120 mouse cell types revealed that during differentiation genes either change their expression or they change their 3′UTR isoform usage. Therefore, we identified thousands of genes with 3′UTR isoform changes that have previously not been implicated in specific biological processes.

Download Full-text

Maximal Power Tests for Detecting Defects in Meiotic Recombination

Genetics ◽

10.1093/genetics/161.3.1333 ◽

2002 ◽

Vol 161 (3) ◽

pp. 1333-1337

Author(s):

Thomas I Milac ◽

Frederick R Adler ◽

Gerald R Smith

Keyword(s):

Optimal Design ◽

Design Of Experiments ◽

Meiotic Recombination ◽

Region Of Interest ◽

Genetic Distances ◽

Genomic Region ◽

Maximal Power ◽

Optimal Design Of Experiments ◽

Crossover Interference ◽

Single Interval

Abstract We have determined the marker separations (genetic distances) that maximize the probability, or power, of detecting meiotic recombination deficiency when only a limited number of meiotic progeny can be assayed. We find that the optimal marker separation is as large as 30–100 cM in many cases. Provided the appropriate marker separation is used, small reductions in recombination potential (as little as 50%) can be detected by assaying a single interval in as few as 100 progeny. If recombination is uniformly altered across the genomic region of interest, the same sensitivity can be obtained by assaying multiple independent intervals in correspondingly fewer progeny. A reduction or abolition of crossover interference, with or without a reduction of recombination proficiency, can be detected with similar sensitivity. We present a set of graphs that display the optimal marker separation and the number of meiotic progeny that must be assayed to detect a given recombination deficiency in the presence of various levels of crossover interference. These results will aid the optimal design of experiments to detect meiotic recombination deficiency in any organism.

Download Full-text

A mutation in LacDWARF1 results in a GA-deficient dwarf phenotype in sponge gourd (Luffa acutangula)

Theoretical and Applied Genetics ◽

10.1007/s00122-021-03938-4 ◽

2021 ◽

Author(s):

Gangjun Zhao ◽

Caixia Luo ◽

Jianning Luo ◽

Junxing Li ◽

Hao Gong ◽

...

Keyword(s):

Gene Annotation ◽

Recessive Gene ◽

Genomic Region ◽

Dwarf Mutant ◽

Rna Seq ◽

Dwarf Phenotype ◽

Sponge Gourd ◽

Response To Stress ◽

Luffa Acutangula ◽

Generation Sequencing

Abstract Key message A dwarfism gene LacDWARF1 was mapped by combined BSA-Seq and comparative genomics analyses to a 65.4 kb physical genomic region on chromosome 05. Abstract Dwarf architecture is one of the most important traits utilized in Cucurbitaceae breeding because it saves labor and increases the harvest index. To our knowledge, there has been no prior research about dwarfism in the sponge gourd. This study reports the first dwarf mutant WJ209 with a decrease in cell size and internodes. A genetic analysis revealed that the mutant phenotype was controlled by a single recessive gene, which is designated Lacdwarf1 (Lacd1). Combined with bulked segregate analysis and next-generation sequencing, we quickly mapped a 65.4 kb region on chromosome 5 using F2 segregation population with InDel and SNP polymorphism markers. Gene annotation revealed that Lac05g019500 encodes a gibberellin 3β-hydroxylase (GA3ox) that functions as the most likely candidate gene for Lacd1. DNA sequence analysis showed that there is an approximately 4 kb insertion in the first intron of Lac05g019500 in WJ209. Lac05g019500 is transcribed incorrectly in the dwarf mutant owing to the presence of the insertion. Moreover, the bioactive GAs decreased significantly in WJ209, and the dwarf phenotype could be restored by exogenous GA3 treatment, indicating that WJ209 is a GA-deficient mutant. All these results support the conclusion that Lac05g019500 is the Lacd1 gene. In addition, RNA-Seq revealed that many genes, including those related to plant hormones, cellular process, cell wall, membrane and response to stress, were significantly altered in WJ209 compared with the wild type. This study will aid in the use of molecular marker-assisted breeding in the dwarf sponge gourd.

Download Full-text

SparK: A Publication-quality NGS Visualization Tool

10.1101/845529 ◽

2019 ◽

Author(s):

Stefan Kurtenbach ◽

J. William Harbour

Keyword(s):

Standard Deviation ◽

Ucsc Genome Browser ◽

Genomic Region ◽

Visualization Tool ◽

Command Line ◽

Integrative Genomics ◽

Rna Seq ◽

Vector Graphic ◽

Ngs Data ◽

Publication Quality

AbstractWhile there are sophisticated resources available for displaying NGS data, including the Integrative Genomics Viewer (IGV) and the UCSC genome browser, exporting regions and assembling figures for publication remains challenging. In particular, customizing track appearance and overlaying track replicates is a manual and time-consuming process. Here, we present SparK, a tool which auto-generates publication-ready, high-resolution, true vector graphic figures from any NGS-based tracks, including RNA-seq, ChIP-seq, and ATAC-seq. Novel functions of SparK include averaging of replicates, plotting standard deviation tracks, and highlighting significantly changed areas. SparK is written in Python 3, making it executable on any major OS platform. Using command line prompts to generate figures allows later changes to be made very easy. For instance, if the genomic region of the plot needs to be changed, or tracks need to be added or removed, the figure can easily be re-generated within seconds without the manual process of re-exporting and re-assembling everything. After plotting with SparK, changes to the output SVG vector graphic files are simple to make, including text, lines, and colors. SparK is publicly available on GitHub: https://github.com/harbourlab/SparK.

Download Full-text