APAlyzer: a bioinformatics package for analysis of alternative polyadenylation isoforms

Ruijia Wang; Bin Tian

doi:10.1093/bioinformatics/btaa266

APAlyzer: a bioinformatics package for analysis of alternative polyadenylation isoforms

Bioinformatics ◽

10.1093/bioinformatics/btaa266 ◽

2020 ◽

Vol 36 (12) ◽

pp. 3907-3909 ◽

Cited By ~ 3

Author(s):

Ruijia Wang ◽

Bin Tian

Keyword(s):

Gene Expression ◽

Alternative Polyadenylation ◽

Supplementary Information ◽

Human Tissues ◽

Bioconductor Package ◽

Supplementary Data ◽

Rna Seq ◽

Eukaryotic Genes ◽

Polyadenylation Sites

Abstract Summary Most eukaryotic genes produce alternative polyadenylation (APA) isoforms. APA is dynamically regulated under different growth and differentiation conditions. Here, we present a bioinformatics package, named APAlyzer, for examining 3′UTR APA, intronic APA and gene expression changes using RNA-seq data and annotated polyadenylation sites in the PolyA_DB database. Using APAlyzer and data from the GTEx database, we present APA profiles across human tissues. Availability and implementation APAlyzer is freely available at https://bioconductor.org/packages/release/bioc/html/APAlyzer.html as an R/Bioconductor package. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

LSTrAP-Kingdom: an automated pipeline to generate annotated gene expression atlases for kingdoms of life

Bioinformatics ◽

10.1093/bioinformatics/btab168 ◽

2021 ◽

Author(s):

William Goh1 ◽

Marek Mutwil1

Keyword(s):

Gene Expression ◽

Large Scale ◽

Supplementary Information ◽

Expression Data ◽

Supplementary Data ◽

Rna Seq ◽

Analysis Pipeline ◽

Study Gene Expression ◽

Automated Pipeline ◽

Bacteria And Fungi

Abstract Motivation There are now more than two million RNA sequencing experiments for plants, animals, bacteria and fungi publicly available, allowing us to study gene expression within and across species and kingdoms. However, the tools allowing the download, quality control and annotation of this data for more than one species at a time are currently missing. Results To remedy this, we present the Large-Scale Transcriptomic Analysis Pipeline in Kingdom of Life (LSTrAP-Kingdom) pipeline, which we used to process 134,521 RNA-seq samples, achieving ∼12,000 processed samples per day. Our pipeline generated quality-controlled, annotated gene expression matrices that rival the manually curated gene expression data in identifying functionally-related genes. Availability LSTrAP-Kingdom is available from: https://github.com/wirriamm/plants-pipeline and is fully implemented in Python and Bash. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

movAPA: modeling and visualization of dynamics of alternative polyadenylation across biological samples

Bioinformatics ◽

10.1093/bioinformatics/btaa997 ◽

2020 ◽

Author(s):

Wenbin Ye ◽

Tao Liu ◽

Hongjuan Fu ◽

Congting Ye ◽

Guoli Ji ◽

...

Keyword(s):

Biological Samples ◽

Tissue Specificity ◽

Single Cells ◽

Alternative Polyadenylation ◽

R Package ◽

Supplementary Information ◽

Rna Seq ◽

Mouse Sperm ◽

High Scalability ◽

A Site

Abstract Motivation Alternative polyadenylation (APA) has been widely recognized as a widespread mechanism modulated dynamically. Studies based on 3′ end sequencing and/or RNA-seq have profiled poly(A) sites in various species with diverse pipelines, yet no unified and easy-to-use toolkit is available for comprehensive APA analyses. Results We developed an R package called movAPA for modeling and visualization of dynamics of alternative polyadenylation across biological samples. movAPA incorporates rich functions for preprocessing, annotation and statistical analyses of poly(A) sites, identification of poly(A) signals, profiling of APA dynamics and visualization. Particularly, seven metrics are provided for measuring the tissue-specificity or usages of APA sites across samples. Three methods are used for identifying 3′ UTR shortening/lengthening events between conditions. APA site switching involving non-3′ UTR polyadenylation can also be explored. Using poly(A) site data from rice and mouse sperm cells, we demonstrated the high scalability and flexibility of movAPA in profiling APA dynamics across tissues and single cells. Availability and implementation https://github.com/BMILAB/movAPA. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Aptardi predicts polyadenylation sites in sample-specific transcriptomes using high-throughput RNA sequencing and DNA sequence

Nature Communications ◽

10.1038/s41467-021-21894-x ◽

2021 ◽

Vol 12 (1) ◽

Author(s):

Ryan Lusk ◽

Evan Stene ◽

Farnoush Banaei-Kashani ◽

Boris Tabakoff ◽

Katerina Kechris ◽

...

Keyword(s):

Rna Sequencing ◽

Dna Sequence ◽

Mammalian Species ◽

Alternative Polyadenylation ◽

Sequence Information ◽

Rna Seq ◽

Average Precision ◽

Polyadenylation Sites ◽

Dna Nucleotide Sequence

AbstractAnnotation of polyadenylation sites from short-read RNA sequencing alone is a challenging computational task. Other algorithms rooted in DNA sequence predict potential polyadenylation sites; however, in vivo expression of a particular site varies based on a myriad of conditions. Here, we introduce aptardi (alternative polyadenylation transcriptome analysis from RNA-Seq data and DNA sequence information), which leverages both DNA sequence and RNA sequencing in a machine learning paradigm to predict expressed polyadenylation sites. Specifically, as input aptardi takes DNA nucleotide sequence, genome-aligned RNA-Seq data, and an initial transcriptome. The program evaluates these initial transcripts to identify expressed polyadenylation sites in the biological sample and refines transcript 3′-ends accordingly. The average precision of the aptardi model is twice that of a standard transcriptome assembler. In particular, the recall of the aptardi model (the proportion of true polyadenylation sites detected by the algorithm) is improved by over three-fold. Also, the model—trained using the Human Brain Reference RNA commercial standard—performs well when applied to RNA-sequencing samples from different tissues and different mammalian species. Finally, aptardi’s input is simple to compile and its output is easily amenable to downstream analyses such as quantitation and differential expression.

Download Full-text

DEsingle for detecting three types of differential expression in single-cell RNA-seq data

10.1101/173997 ◽

2017 ◽

Cited By ~ 1

Author(s):

Zhun Miao ◽

Ke Deng ◽

Xiaowo Wang ◽

Xuegong Zhang

Keyword(s):

Single Cell ◽

Differential Expression ◽

Negative Binomial ◽

Single Cells ◽

R Package ◽

Supplementary Information ◽

Binomial Model ◽

Supplementary Data ◽

Rna Seq ◽

Real Zeros

AbstractSummaryThe excessive amount of zeros in single-cell RNA-seq data include “real” zeros due to the on-off nature of gene transcription in single cells and “dropout” zeros due to technical reasons. Existing differential expression (DE) analysis methods cannot distinguish these two types of zeros. We developed an R package DEsingle which employed Zero-Inflated Negative Binomial model to estimate the proportion of real and dropout zeros and to define and detect 3 types of DE genes in single-cell RNA-seq data with higher accuracy.Availability and ImplementationThe R package DEsingle is freely available at https://github.com/miaozhun/DEsingle and is under Bioconductor’s consideration [email protected] informationSupplementary data are available at bioRxiv online.

Download Full-text

LIONS: analysis suite for detecting and quantifying transposable element initiated transcription from RNA-seq

Bioinformatics ◽

10.1093/bioinformatics/btz130 ◽

2019 ◽

Vol 35 (19) ◽

pp. 3839-3841 ◽

Cited By ~ 6

Author(s):

Artem Babaian ◽

I Richard Thompson ◽

Jake Lever ◽

Liane Gagnier ◽

Mohammad M Karimi ◽

...

Keyword(s):

Transposable Elements ◽

Transposable Element ◽

Test Data ◽

Source Code ◽

Supplementary Information ◽

Transcriptional Networks ◽

Supplementary Data ◽

Rna Seq ◽

Transcriptional Initiation ◽

Instruction Manual

Abstract Summary Transposable elements (TEs) influence the evolution of novel transcriptional networks yet the specific and meaningful interpretation of how TE-derived transcriptional initiation contributes to the transcriptome has been marred by computational and methodological deficiencies. We developed LIONS for the analysis of RNA-seq data to specifically detect and quantify TE-initiated transcripts. Availability and implementation Source code, container, test data and instruction manual are freely available at www.github.com/ababaian/LIONS. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

ngsReports: a Bioconductor package for managing FastQC reports and other NGS related log files

Bioinformatics ◽

10.1093/bioinformatics/btz937 ◽

2019 ◽

Vol 36 (8) ◽

pp. 2587-2588 ◽

Cited By ~ 10

Author(s):

Christopher M Ward ◽

Thu-Hien To ◽

Stephen M Pederson

Keyword(s):

Quality Control ◽

R Package ◽

Supplementary Information ◽

Bioconductor Package ◽

Supplementary Data ◽

Large Sample ◽

Log Files ◽

Shiny App ◽

Next Generation Sequencing Ngs ◽

Generation Sequencing

Abstract Motivation High throughput next generation sequencing (NGS) has become exceedingly cheap, facilitating studies to be undertaken containing large sample numbers. Quality control (QC) is an essential stage during analytic pipelines and the outputs of popular bioinformatics tools such as FastQC and Picard can provide information on individual samples. Although these tools provide considerable power when carrying out QC, large sample numbers can make inspection of all samples and identification of systemic bias a challenge. Results We present ngsReports, an R package designed for the management and visualization of NGS reports from within an R environment. The available methods allow direct import into R of FastQC reports along with outputs from other tools. Visualization can be carried out across many samples using default, highly customizable plots with options to perform hierarchical clustering to quickly identify outlier libraries. Moreover, these can be displayed in an interactive shiny app or HTML report for ease of analysis. Availability and implementation The ngsReports package is available on Bioconductor and the GUI shiny app is available at https://github.com/UofABioinformaticsHub/shinyNgsreports. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

scDAPA: detection and visualization of dynamic alternative polyadenylation from single cell RNA-seq data

Bioinformatics ◽

10.1093/bioinformatics/btz701 ◽

2019 ◽

Cited By ~ 2

Author(s):

Congting Ye ◽

Qian Zhou ◽

Xiaohui Wu ◽

Chen Yu ◽

Guoli Ji ◽

...

Keyword(s):

Single Cell ◽

Alternative Polyadenylation ◽

Cellular Heterogeneity ◽

Supplementary Information ◽

Rna Seq ◽

Computational Tool ◽

Cell Level ◽

Wilcoxon Rank Sum Test ◽

Transcriptional Regulatory ◽

Cell Groups

Abstract Motivation Alternative polyadenylation (APA) plays a key post-transcriptional regulatory role in mRNA stability and functions in eukaryotes. Single cell RNA-seq (scRNA-seq) is a powerful tool to discover cellular heterogeneity at gene expression level. Given 3′ enriched strategy in library construction, the most commonly used scRNA-seq protocol—10× Genomics enables us to improve the study resolution of APA to the single cell level. However, currently there is no computational tool available for investigating APA profiles from scRNA-seq data. Results Here, we present a package scDAPA for detecting and visualizing dynamic APA from scRNA-seq data. Taking bam/sam files and cell cluster labels as inputs, scDAPA detects APA dynamics using a histogram-based method and the Wilcoxon rank-sum test, and visualizes candidate genes with dynamic APA. Benchmarking results demonstrated that scDAPA can effectively identify genes with dynamic APA among different cell groups from scRNA-seq data. Availability and implementation The scDAPA package is implemented in Shell and R, and is freely available at https://scdapa.sourceforge.io. Contact [email protected] Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

MAJIQ-SPEL: web-tool to interrogate classical and complex splicing variations from RNA-Seq data

Bioinformatics ◽

10.1093/bioinformatics/btx565 ◽

2017 ◽

Vol 34 (2) ◽

pp. 300-302 ◽

Cited By ~ 2

Author(s):

Christopher J Green ◽

Matthew R Gazzara ◽

Yoseph Barash

Keyword(s):

Experimental Validation ◽

Ucsc Genome Browser ◽

Supplementary Information ◽

Supplementary Data ◽

Rna Seq ◽

Web Tool ◽

Rt Pcr ◽

Design Algorithm ◽

Gene Isoforms ◽

Downstream Analysis

Abstract Summary Analysis of RNA sequencing (RNA-Seq) data have highlighted the fact that most genes undergo alternative splicing (AS) and that these patterns are tightly regulated. Many of these events are complex, resulting in numerous possible isoforms that quickly become difficult to visualize, interpret and experimentally validate. To address these challenges we developed MAJIQ-SPEL, a web-tool that takes as input local splicing variations (LSVs) quantified from RNA-Seq data and provides users with visualization and quantification of gene isoforms associated with those. Importantly, MAJIQ-SPEL is able to handle both classical (binary) and complex, non-binary, splicing variations. Using a matching primer design algorithm it also suggests to users possible primers for experimental validation by RT-PCR and displays those, along with the matching protein domains affected by the LSV, on UCSC Genome Browser for further downstream analysis. Availability and implementation Program and code will be available athttp://majiq.biociphers.org/majiq-spel. Supplementary information Supplementary data are available atBioinformatics online.

Download Full-text

Identifying core biological processes distinguishing human eye tissues with precise systems-level gene expression analyses and weighted correlation networks

10.1101/136960 ◽

2017 ◽

Author(s):

John M Bryan ◽

Temesgen D Fufa ◽

Kapil Bharti ◽

Brian P Brooks ◽

Robert B Hufnagel ◽

...

Keyword(s):

Gene Expression ◽

Expression Patterns ◽

Mouse Retina ◽

Human Tissues ◽

Biological Processes ◽

Rna Seq ◽

Human Eye ◽

Correlation Networks ◽

Eye Tissues ◽

Study Gene Expression

AbstractThe human eye is built from several specialized tissues which direct, capture, and pre-process information to provide vision. The gene expression of the different eye tissues has been extensively profiled with RNA-seq across numerous studies. Large consortium projects have also used RNA-seq to study gene expression patterning across many different human tissues, minus the eye. There has not been an integrated study of expression patterns from multiple eye tissues compared to other human body tissues. We have collated all publicly available healthy human eye RNA-seq datasets as well as dozens of other tissues. We use this fully integrated dataset to probe the biological processes and pan expression relationships between the cornea, retina, RPE-choroid complex, and the rest of the human tissues with differential expression, clustering, and GO term enrichment tools. We also leverage our large collection of retina and RPE-choroid tissues to build the first human weighted gene correlation networks and use them to highlight known biological pathways and eye gene disease enrichment. We also have integrated publicly available single cell RNA-seq data from mouse retina into our framework for validation and discovery. Finally, we make all these data, analyses, and visualizations available via a powerful interactive web application (https://eyeintegration.nei.nih.gov/).

Download Full-text

SPsimSeq: semi-parametric simulation of bulk and single cell RNA sequencing data

10.1101/677740 ◽

2019 ◽

Cited By ~ 1

Author(s):

Alemu Takele Assefa ◽

Jo Vandesompele ◽

Olivier Thas

Keyword(s):

Gene Expression ◽

Single Cell ◽

Rna Sequencing ◽

Empirical Distribution ◽

Supplementary Information ◽

Rna Seq ◽

Sequencing Data ◽

Actual Distribution ◽

Wide Range ◽

Single Cell Rna Sequencing

SummarySPsimSeq is a semi-parametric simulation method for bulk and single cell RNA sequencing data. It simulates data from a good estimate of the actual distribution of a given real RNA-seq dataset. In contrast to existing approaches that assume a particular data distribution, our method constructs an empirical distribution of gene expression data from a given source RNA-seq experiment to faithfully capture the data characteristics of real data. Importantly, our method can be used to simulate a wide range of scenarios, such as single or multiple biological groups, systematic variations (e.g. confounding batch effects), and different sample sizes. It can also be used to simulate different gene expression units resulting from different library preparation protocols, such as read counts or UMI counts.Availability and implementationThe R package and associated documentation is available from https://github.com/CenterForStatistics-UGent/SPsimSeq.Supplementary informationSupplementary data are available at bioRχiv online.

Download Full-text