MAAPER: model-based analysis of alternative polyadenylation using 3' end-linked reads

2021 ◽  
Author(s):  
Wei Vivian Li ◽  
Dinghai Zheng ◽  
Ruijia Wang ◽  
Bin Tian

Most eukaryotic genes harbor multiple cleavage and polyadenylation sites (PASs), leading to expression of alternative polyadenylation (APA) isoforms. APA regulation has been implicated in a diverse array of physiological and pathological conditions. While RNA sequencing tools that generate reads containing the PAS, named onSite reads, have been instrumental in identifying PASs, they have not been widely used. By contrast, a growing number of methods generate reads that are close to the PAS, named nearSite reads, including the 3' end counting strategy commonly used in single cell analysis. How these nearSite reads can be used for APA analysis, however, is poorly studied. Here, we present a computational method, named model-based analysis of alternative polyadenylation using 3' end-linked reads (MAAPER), to examine APA using nearSite reads. MAAPER uses a probabilistic model to predict PASs for nearSite reads with high accuracy and sensitivity, and examines different types of APA events, including those in 3'UTRs and introns, with robust statistics. We show MAAPER's accuracy with data from both bulk and single cell RNA samples and its applicability in unpaired or paired experimental designs. Our result also highlights the importance of using well annotated PASs for nearSite read analysis.

2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Wei Vivian Li ◽  
Dinghai Zheng ◽  
Ruijia Wang ◽  
Bin Tian

AbstractMost eukaryotic genes express alternative polyadenylation (APA) isoforms. A growing number of RNA sequencing methods, especially those used for single-cell transcriptome analysis, generate reads close to the polyadenylation site (PAS), termed nearSite reads, hence inherently containing information about APA isoform abundance. Here, we present a probabilistic model-based method named MAAPER to utilize nearSite reads for APA analysis. MAAPER predicts PASs with high accuracy and sensitivity and examines different types of APA events with robust statistics. We show MAAPER’s performance with both bulk and single-cell data and its applicability in unpaired or paired experimental designs.


2020 ◽  
Author(s):  
Cody N. Heiser ◽  
Victoria M. Wang ◽  
Bob Chen ◽  
Jacob J. Hughey ◽  
Ken S. Lau

AbstractA major challenge for droplet-based single-cell sequencing technologies is distinguishing true cells from uninformative barcodes in datasets with disparate library sizes confounded by high technical noise (i.e. batch-specific ambient RNA). We present dropkick, a fully automated software tool for quality control and filtering of single-cell RNA sequencing (scRNA-seq) data with a focus on excluding ambient barcodes and recovering real cells bordering the quality threshold. By automatically determining dataset-specific training labels based on predictive global heuristics, dropkick learns a gene-based representation of real cells and ambient noise, calculating a cell probability score for each barcode. Using simulated and real-world scRNA-seq data, we benchmarked dropkick against a conventional thresholding approach and EmptyDrops, a popular computational method, demonstrating greater recovery of rare cell types and exclusion of empty droplets and noisy, uninformative barcodes. We show for both low and high-background datasets that dropkick’s weakly supervised model reliably learns which genes are enriched in ambient barcodes and draws a multidimensional boundary that is more robust to dataset-specific variation than existing filtering approaches. dropkick provides a fast, automated tool for reproducible cell identification from scRNA-seq data that is critical to downstream analysis and compatible with popular single-cell analysis Python packages.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Guo-Wei Li ◽  
Fang Nan ◽  
Guo-Hua Yuan ◽  
Chu-Xiao Liu ◽  
Xindong Liu ◽  
...  

AbstractSingle-cell RNA-seq (scRNA-seq) profiles gene expression with high resolution. Here, we develop a stepwise computational method-called SCAPTURE to identify, evaluate, and quantify cleavage and polyadenylation sites (PASs) from 3′ tag-based scRNA-seq. SCAPTURE detects PASs de novo in single cells with high sensitivity and accuracy, enabling detection of previously unannotated PASs. Quantified alternative PAS transcripts refine cell identity analysis beyond gene expression, enriching information extracted from scRNA-seq data. Using SCAPTURE, we show changes of PAS usage in PBMCs from infected versus healthy individuals at single-cell resolution.


2018 ◽  
Author(s):  
Ivan Juric ◽  
Miao Yu ◽  
Armen Abnousi ◽  
Ramya Raviram ◽  
Rongxin Fang ◽  
...  

AbstractHi-C and chromatin immunoprecipitation (ChIP) have been combined to identify long-range chromatin interactions genome-wide at reduced cost and enhanced resolution, but extracting the information from the resulting datasets has been challenging. Here we describe a computational method, MAPS, Model-based Analysis of PLAC-seq and HiChIP, to process the data from such experiments and identify long-range chromatin interactions. MAPS adopts a zero-truncated Poisson regression framework to explicitly remove systematic biases in the PLAC-seq and HiChIP datasets, and then uses the normalized chromatin contact frequencies to identify significant chromatin interactions anchored at genomic regions bound by the protein of interest. MAPS shows superior performance over existing software tools in analysis of chromatin interactions centered on cohesin, CTCF and H3K4me3 associated regions in multiple cell types. MAPS is freely available at https://github.com/ijuric/MAPS.


2020 ◽  
Vol 21 (18) ◽  
pp. 6460
Author(s):  
Takayuki Ikeda ◽  
Hidehito Saito-Takatsuji ◽  
Yasuo Yoshitomi ◽  
Hideto Yonekura

Mature mRNA is generated by the 3ʹ end cleavage and polyadenylation of its precursor pre-mRNA. Eukaryotic genes frequently have multiple polyadenylation sites, resulting in mRNA isoforms with different 3ʹ-UTR lengths that often encode different C-terminal amino acid sequences. It is well-known that this form of post-transcriptional modification, termed alternative polyadenylation, can affect mRNA stability, localization, translation, and nuclear export. We focus on the alternative polyadenylation of pre-mRNA for vascular endothelial growth factor receptor-1 (VEGFR-1), the receptor for VEGF. VEGFR-1 is a transmembrane protein with a tyrosine kinase in the intracellular region. Secreted forms of VEGFR-1 (sVEGFR-1) are also produced from the same gene by alternative polyadenylation, and sVEGFR-1 has a function opposite to that of VEGFR-1 because it acts as a decoy receptor for VEGF. However, the mechanism that regulates the production of sVEGFR-1 by alternative polyadenylation remains poorly understood. In this review, we introduce and discuss the mechanism of alternative polyadenylation of VEGFR-1 mediated by protein arginine methylation.


2020 ◽  
Vol 11 (1) ◽  
Author(s):  
Rick Farouni ◽  
Haig Djambazian ◽  
Lorenzo E. Ferri ◽  
Jiannis Ragoussis ◽  
Hamed S. Najafabadi

2021 ◽  
Author(s):  
Jaeyong Choi ◽  
Woochan Lee ◽  
Jung-Ki Yoon ◽  
Jong-Il Kim

Background: Although single cell RNAseq of xenograft samples are widely used, there is no comprehensive pipeline for human and mouse mixed single cell analysis. Method: We used public data to assess misalignment error when using human and mouse combined reference, and generated a pipeline based on expression-based species deconvolution with species matching reference realignment to remove errors. We also found false-positive signals presumed to originate from ambient RNA of the other species, and use computational method to adequately remove them. Result: Misaligned reads account to on average 0.5% of total reads but expression of few genees were greatly affected leading to 99.8% loss in expression. Human and mouse mixed single cell data analyzed by our pipeline clustered well with unmixed data. We also applied our pipeline to multi-species multi-sample single cell library containing breast cancer xenograft tissue and successfully identified all identities along with the diverse cell types of tumor microenvironment. Conclusion: We present our pipeline for mixed human and mose single cell data which can also be applied to pooled libraries to obtain cost effective single cell data. We also address consideration points when analyzing mixed single cell data for future development.


PLoS ONE ◽  
2012 ◽  
Vol 7 (12) ◽  
pp. e53357 ◽  
Author(s):  
David Tsai ◽  
Spencer Chen ◽  
Dario A. Protti ◽  
John W. Morley ◽  
Gregg J. Suaning ◽  
...  

Author(s):  
Alexander Lind ◽  
Falastin Salami ◽  
Anne‐Marie Landtblom ◽  
Lars Palm ◽  
Åke Lernmark ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document