scholarly journals Polee: RNA-Seq analysis using approximate likelihood

2021 ◽  
Vol 3 (2) ◽  
Author(s):  
Daniel C Jones ◽  
Walter L Ruzzo

Abstract The analysis of mRNA transcript abundance with RNA-Seq is a central tool in molecular biology research, but often analyses fail to account for the uncertainty in these estimates, which can be significant, especially when trying to disentangle isoforms or duplicated genes. Preserving uncertainty necessitates a full probabilistic model of the all the sequencing reads, which quickly becomes intractable, as experiments can consist of billions of reads. To overcome these limitations, we propose a new method of approximating the likelihood function of a sparse mixture model, using a technique we call the Pólya tree transformation. We demonstrate that substituting this approximation for the real thing achieves most of the benefits with a fraction of the computational costs, leading to more accurate detection of differential transcript expression and transcript coexpression.

2020 ◽  
Author(s):  
Daniel C. Jones ◽  
Walter L. Ruzzo

AbstractThe analysis of mRNA transcript abundance with RNA-Seq is a central tool in molecular biology research, but often analyses fail to account for the uncertainty in these estimates, which can be significant, especially when trying to disentangle isoforms or duplicated genes. Preserving un-certainty necessitates a full probabilistic model of the all the sequencing reads, which quickly becomes intractable, as experiments can consist of billions of reads. To overcome these limitations, we propose a new method of approximating the likelihood function of a sparse mixture model, using a technique we call the Pólya tree transformation. We demonstrate that substituting this approximation for the real thing achieves most of the benefits with a fraction of the computational costs, leading to more accurate detection of differential transcript expression.AvailabilityThe method is implemented in a Julia package available from https://github.com/dcjones/[email protected]


BMC Genomics ◽  
2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Shuhua Zhan ◽  
Cortland Griswold ◽  
Lewis Lukens

Abstract Background Genetic variation for gene expression is a source of phenotypic variation for natural and agricultural species. The common approach to map and to quantify gene expression from genetically distinct individuals is to assign their RNA-seq reads to a single reference genome. However, RNA-seq reads from alleles dissimilar to this reference genome may fail to map correctly, causing transcript levels to be underestimated. Presently, the extent of this mapping problem is not clear, particularly in highly diverse species. We investigated if mapping bias occurred and if chromosomal features associated with mapping bias. Zea mays presents a model species to assess these questions, given it has genotypically distinct and well-studied genetic lines. Results In Zea mays, the inbred B73 genome is the standard reference genome and template for RNA-seq read assignments. In the absence of mapping bias, B73 and a second inbred line, Mo17, would each have an approximately equal number of regulatory alleles that increase gene expression. Remarkably, Mo17 had 2–4 times fewer such positively acting alleles than did B73 when RNA-seq reads were aligned to the B73 reference genome. Reciprocally, over one-half of the B73 alleles that increased gene expression were not detected when reads were aligned to the Mo17 genome template. Genes at dissimilar chromosomal ends were strongly affected by mapping bias, and genes at more similar pericentromeric regions were less affected. Biased transcript estimates were higher in untranslated regions and lower in splice junctions. Bias occurred across software and alignment parameters. Conclusions Mapping bias very strongly affects gene transcript abundance estimates in maize, and bias varies across chromosomal features. Individual genome or transcriptome templates are likely necessary for accurate transcript estimation across genetically variable individuals in maize and other species.


2021 ◽  
Vol 5 (Supplement_1) ◽  
pp. A1018-A1019
Author(s):  
Christian Secchi ◽  
Paola Benaglio ◽  
Francesca Mulas ◽  
Martina Belli ◽  
Dwayne Stupack ◽  
...  

Abstract Background: Adult granulosa cell tumor (aGCT) is a rare type of stromal cell malignant cancer of the ovary. Postmenopausal genital bleeding is the main aGCT clinical sign which is attributed to estrogen excess driven by CYP19 upregulation. Typically, aGCTs that are diagnosed at an initial stage can be treated with surgery. However, recurrences are mostly fatal1. Current studies are focused on finding new molecular markers and targets that aim to treat the aGCTs recurrence. Between 95-97% of aGCTs harbor a somatic mutation in the FOXL2 gene, Cys134Trp (c.402C<G)2. A TGF-β pathway protein, SMAD3, was identified as an essential partner in FOXL2C134W transcriptional activity driving CYP19 upregulation3. Recently, the antitumoral FOXO1 gene has been recognized as a potential target for suppressing the FOXL2C134W pathogenic action4. Aim: The objective of this study was to examine whether FOXO1 upregulation affects the FOXL2C143W/SMAD3 transcriptomic landscape. Methods: RNA-seq analysis was performed comparing the effect of FOXL2WT/SMAD3 and FOXL2C143W/SMAD3 overexpression in presence of FOXO1 by transfection of an established human GC line (HGrC1). RNA-seq libraries were prepared using the illumina TrueSeq and sequenced using an illumina HiSeq Platform4000. To quantify transcript abundance for each sample we used salmon (1.1.0) with default parameters, using indexes from hg38. Data was subsequently imported in R using the tximport package and processed with the DESeq2 package. Results: RNA-seq data show that FOXL2C143W/SMAD3 significantly drives 717 genes compared with the WT and enabled us to identify targets (TGFB2, SMARCA4, HSPG2, MKI67, NFKBIA) and neoplastic pathways directly associated with the mutant. To provide evidence that the differences in gene expression were attributed to a direct consequence of FOXL2 binding, we annotated gene promoters with previously published FOXL2 ChIP-seq analysis. The majority (73-40%) of the differential expressed genes (DEGs) between FOXL2C134W and FOXL2WT had a FOXL2 binding site at their promoters, which was a significantly higher proportion than in non-DEGs (Fisher’s exact test, murine: p= 7.9x10-157; human, p= 9.9x10-39). Surprisingly, the number of DEGs between FOXL2C134W + FOXO1 and FOXL2WT was much lower (230) with respect to the number of DEGs between FOXL2C134W and FOXL2WT (717, of which 130 in common; linear regression slope ß = 0 .58), suggesting that the effect of FOXL2C134W compared with FOXL2WT is moderated by the addition of FOXO1. Conclusions: Our transcriptomic study provides the first evidence that FOXO1 can efficiently mitigate 40% of the altered genome-wide effect specifically related to FOXL2C134W in a model of human aGCT.1 Farkkila, A. et al. Ann Med (2017). 2 Jamieson, S. & Fuller, P. J. Endocr Rev (2012). 3 Belli, M. et al. Endocrinology (2018). 4 Belli, M et al. J Endocr Soc (2019).


2019 ◽  
Author(s):  
Avi Srivastava ◽  
Laraib Malik ◽  
Hirak Sarkar ◽  
Mohsen Zakeri ◽  
Fatemeh Almodaresi ◽  
...  

AbstractBackgroundThe accuracy of transcript quantification using RNA-seq data depends on many factors, such as the choice of alignment or mapping method and the quantification model being adopted. While the choice of quantification model has been shown to be important, considerably less attention has been given to comparing the effect of various read alignment approaches on quantification accuracy.ResultsWe investigate the influence of mapping and alignment on the accuracy of transcript quantification in both simulated and experimental data, as well as the effect on subsequent differential expression analysis. We observe that, even when the quantification model itself is held fixed, the effect of choosing a different alignment methodology, or aligning reads using different parameters, on quantification estimates can sometimes be large, and can affect downstream differential expression analyses as well. These effects can go unnoticed when assessment is focused too heavily on simulated data, where the alignment task is often simpler than in experimentally-acquired samples. We also introduce a new alignment methodology, called selective alignment, to overcome the shortcomings of lightweight approaches without incurring the computational cost of traditional alignment.ConclusionWe observe that, on experimental datasets, the performance of lightweight mapping and alignment-based approaches varies significantly and highlight some of the underlying factors. We show this variation both in terms of quantification and downstream differential expression analysis. In all comparisons, we also show the improved performance of our proposed selective alignment method and suggest best practices for performing RNA-seq quantification.


F1000Research ◽  
2015 ◽  
Vol 4 ◽  
pp. 155 ◽  
Author(s):  
Sandeep Chakraborty ◽  
Monica Britton ◽  
Jill Wegrzyn ◽  
Timothy Butterfield ◽  
Pedro José Martínez-García ◽  
...  

The transcriptome provides a functional footprint of the genome by enumerating the molecular components of cells and tissues. The field of transcript discovery has been revolutionized through high-throughput mRNA sequencing (RNA-seq). Here, we present a methodology that replicates and improves existing methodologies, and implements a workflow for error estimation and correction followed by genome annotation and transcript abundance estimation for RNA-seq derived transcriptome sequences (YeATS - Yet Another Tool Suite for analyzing RNA-seq derived transcriptome). A unique feature of YeATS is the upfront determination of the errors in the sequencing or transcript assembly process by analyzing open reading frames of transcripts. YeATS identifies transcripts that have not been merged, result in broken open reading frames or contain long repeats as erroneous transcripts. We present the YeATS workflow using a representative sample of the transcriptome from the tissue at the heartwood/sapwood transition zone in black walnut. A novel feature of the transcriptome that emerged from our analysis was the identification of a highly abundant transcript that had no known homologous genes (GenBank accession: KT023102). The amino acid composition of the longest open reading frame of this gene classifies this as a putative extensin. Also, we corroborated the transcriptional abundance of proline-rich proteins, dehydrins, senescence-associated proteins, and the DNAJ family of chaperone proteins. Thus, YeATS presents a workflow for analyzing RNA-seq data with several innovative features that differentiate it from existing software.


2019 ◽  
Vol 20 (S24) ◽  
Author(s):  
Hongfei Cui ◽  
Hailin Hu ◽  
Jianyang Zeng ◽  
Ting Chen

Abstract Background Ribosome profiling brings insight to the process of translation. A basic step in profile construction at transcript level is to map Ribo-seq data to transcripts, and then assign a huge number of multiple-mapped reads to similar isoforms. Existing methods either discard the multiple mapped-reads, or allocate them randomly, or assign them proportionally according to transcript abundance estimated from RNA-seq data. Results Here we present DeepShape, an RNA-seq free computational method to estimate ribosome abundance of isoforms, and simultaneously compute their ribosome profiles using a deep learning model. Our simulation results demonstrate that DeepShape can provide more accurate estimations on both ribosome abundance and profiles when compared to state-of-the-art methods. We applied DeepShape to a set of Ribo-seq data from PC3 human prostate cancer cells with and without PP242 treatment. In the four cell invasion/metastasis genes that are translationally regulated by PP242 treatment, different isoforms show very different characteristics of translational efficiency and regulation patterns. Transcript level ribosome distributions were analyzed by “Codon Residence Index (CRI)” proposed in this study to investigate the relative speed that a ribosome moves on a codon compared to its synonymous codons. We observe consistent CRI patterns in PC3 cells. We found that the translation of several codons could be regulated by PP242 treatment. Conclusion In summary, we demonstrate that DeepShape can serve as a powerful tool for Ribo-seq data analysis.


2019 ◽  
Vol 9 (1) ◽  
Author(s):  
Viktoria Betin ◽  
Cristina Penaranda ◽  
Nirmalya Bandyopadhyay ◽  
Rui Yang ◽  
Angela Abitua ◽  
...  

AbstractDual transcriptional profiling of host and bacteria during infection is challenging due to the low abundance of bacterial mRNA. We report Pathogen Hybrid Capture (PatH-Cap), a method to enrich for bacterial mRNA and deplete bacterial rRNA simultaneously from dual RNA-seq libraries using transcriptome-specific probes. By addressing both the differential RNA content of the host relative to the infecting bacterium and the overwhelming abundance of uninformative structural RNAs (rRNA, tRNA) of both species in a single step, this approach enables analysis of very low-input RNA samples. By sequencing libraries before (pre-PatH-Cap) and after (post-PatH-Cap) enrichment, we achieve dual transcriptional profiling of host and bacteria, respectively, from the same sample. Importantly, enrichment preserves relative transcript abundance and increases the number of unique bacterial transcripts per gene in post-PatH-Cap libraries compared to pre-PatH-Cap libraries at the same sequencing depth, thereby decreasing the sequencing depth required to fully capture the transcriptional profile of the infecting bacteria. We demonstrate that PatH-Cap enables the study of low-input samples including single eukaryotic cells infected by 1–3 Pseudomonas aeruginosa bacteria and paired host-pathogen temporal gene expression analysis of Mycobacterium tuberculosis infecting macrophages. PatH-Cap can be applied to the study of a range of pathogens and microbial species, and more generally, to lowly-abundant species in mixed populations.


2020 ◽  
Vol 19 (5-6) ◽  
pp. 339-342 ◽  
Author(s):  
Krishna A Srinivasan ◽  
Suman K Virdee ◽  
Andrew G McArthur

Abstract RNA sequencing (RNA-Seq) is a complicated protocol, both in the laboratory in generation of data and at the computer in analysis of results. Several decisions during RNA-Seq library construction have important implications for analysis, most notably strandedness during complementary DNA library construction. Here, we clarify bioinformatic decisions related to strandedness in both alignment of DNA sequencing reads to reference genomes and subsequent determination of transcript abundance.


2012 ◽  
Vol 30 (15_suppl) ◽  
pp. 10508-10508
Author(s):  
Vinay Varadan ◽  
Sitharthan Kamalakaran ◽  
Angel Janevski ◽  
Nila Banerjee ◽  
Kimberly Lezon-Geyda ◽  
...  

10508 Background: Identification of differentially expressed transcripts after brief exposure to preoperative therapy can help determine likely response markers. We quantify and compare differential gene and isoform expression using RNA-seq on patient samples with 10 day exposure to one dose of trastuzumab, bevacizumab or nab-paclitaxel. Methods: We sequenced transcriptomes of 23 pairs of core biopsy RNA from breast cancers pre/post 10 day exposure to therapy. Paired-end sequencing was done on the Illumina GAII platform using amplified total RNA with 74bp read length, yielding data on transcript abundance for a total of 22,160 genes and 34,449 transcripts. Differential expression of transcripts between pre/post samples was estimated assuming Poisson-distributed read-counts, followed by multiple testing correction and enrichment analysis of 185 KEGG pathways. Results: PAM50-based clustering showed individual samples cluster together, demonstrating that tumor subtypes do not change over the 10-day treatment (SABCS 2011). We identified genes that were significantly differentially expressed (p<0.05; FDR<0.1) in at least 60% of samples within each therapy arm: 780 genes in trastuzumab, 302 in bevacizumab, and 176 in nab-paclitaxel. Surprisingly, only THAP11 and TINF2 were common amongst them. THAP11 is involved in stem cell maintenance and TINF2 is important for regulation of telomere length. Immune system and metabolism-related pathways were commonly affected (p<0.05) across all arms. The bevacizumab arm showed significant down-regulation of angiogenesis-associated genes: ESM1 and VEGFR2 in > 80% of samples. The nab-paclitaxel arm exhibited changes in TGF-beta signaling, Nod-like receptor and Wnt signaling. The trastuzumab arm exhibited consistent alteration of ErbB2 and mTOR pathways, with SOX11 and TOP2B downregulated in every sample. Conclusions: This is the first study to compare gene expression with brief exposure across therapies using RNA-seq technology. The unique aspects of transcriptional response to each treatment underscore the need for specific markers of therapeutic response to nab-paclitaxel, bevacizumab and trastuzumab.


Sign in / Sign up

Export Citation Format

Share Document