Matataki: an ultrafast mRNA quantification method for large-scale reanalysis of RNA-Seq data

Abstract Motivation Accurate estimation of transcript isoform abundance is critical for downstream transcriptome analyses and can lead to precise molecular mechanisms for understanding complex human diseases, like cancer. Simplex mRNA Sequencing (RNA-Seq) based isoform quantification approaches are facing the challenges of inherent sampling bias and unidentifiable read origins. A large-scale experiment shows that the consistency between RNA-Seq and other mRNA quantification platforms is relatively low at the isoform level compared to the gene level. In this project, we developed a platform-integrated model for transcript quantification (IntMTQ) to improve the performance of RNA-Seq on isoform expression estimation. IntMTQ, which benefits from the mRNA expressions reported by the other platforms, provides more precise RNA-Seq-based isoform quantification and leads to more accurate molecular signatures for disease phenotype prediction. Results In the experiments to assess the quality of isoform expression estimated by IntMTQ, we designed three tasks for clustering and classification of 46 cancer cell lines with four different mRNA quantification platforms, including newly developed NanoString’s nCounter technology. The results demonstrate that the isoform expressions learned by IntMTQ consistently provide more and better molecular features for downstream analyses compared with five baseline algorithms which consider RNA-Seq data only. An independent RT-qPCR experiment on seven genes in twelve cancer cell lines showed that the IntMTQ improved overall transcript quantification. The platform-integrated algorithms could be applied to large-scale cancer studies, such as The Cancer Genome Atlas (TCGA), with both RNA-Seq and array-based platforms available. Availability and implementation Source code is available at: https://github.com/CompbioLabUcf/IntMTQ. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Transcriptional and morphological profiling of parvalbumin interneuron subpopulations in the mouse hippocampus

Nature Communications ◽

10.1038/s41467-020-20328-4 ◽

2021 ◽

Vol 12 (1) ◽

Author(s):

Lin Que ◽

David Lukacsovich ◽

Wenshu Luo ◽

Csaba Földy

Keyword(s):

Large Scale ◽

Cell Types ◽

Rna Seq ◽

Neuronal Identity ◽

Parvalbumin Interneurons ◽

Different Types ◽

Parvalbumin Interneuron ◽

Cam Profile ◽

Developmental Domains

AbstractThe diversity reflected by >100 different neural cell types fundamentally contributes to brain function and a central idea is that neuronal identity can be inferred from genetic information. Recent large-scale transcriptomic assays seem to confirm this hypothesis, but a lack of morphological information has limited the identification of several known cell types. In this study, we used single-cell RNA-seq in morphologically identified parvalbumin interneurons (PV-INs), and studied their transcriptomic states in the morphological, physiological, and developmental domains. Overall, we find high transcriptomic similarity among PV-INs, with few genes showing divergent expression between morphologically different types. Furthermore, PV-INs show a uniform synaptic cell adhesion molecule (CAM) profile, suggesting that CAM expression in mature PV cells does not reflect wiring specificity after development. Together, our results suggest that while PV-INs differ in anatomy and in vivo activity, their continuous transcriptomic and homogenous biophysical landscapes are not predictive of these distinct identities.

Download Full-text

Multiple Alu exonization in 3’UTR of a primate specific isoform of CYP20A1 creates a potential miRNA sponge

Genome Biology and Evolution ◽

10.1093/gbe/evaa233 ◽

2020 ◽

Author(s):

Aniket Bhattacharya ◽

Vineet Jha ◽

Khushboo Singhal ◽

Mahar Fatima ◽

Dayanidhi Singh ◽

...

Keyword(s):

Heat Shock ◽

Cortical Neurons ◽

Regulatory Networks ◽

Large Scale ◽

Neuronal Development ◽

Random Sets ◽

Rna Seq ◽

Orphan Gene ◽

Mirna Sponge ◽

Human Neurons

Abstract Alu repeats contribute to phylogenetic novelties in conserved regulatory networks in primates. Our study highlights how exonized Alus could nucleate large-scale mRNA-miRNA interactions. Using a functional genomics approach, we characterize a transcript isoform of an orphan gene, CYP20A1 (CYP20A1_Alu-LT) that has exonization of 23 Alus in its 3’UTR. CYP20A1_Alu-LT, confirmed by 3’RACE, is an outlier in length (9 kb 3’UTR) and widely expressed. Using publically available datasets, we demonstrate its expression in higher primates and presence in single nucleus RNA-seq of 15928 human cortical neurons. miRanda predicts ∼4700 miRNA recognition elements (MREs) for ∼1000 miRNAs, primarily originated within these 3’UTR-Alus. CYP20A1_Alu-LT could be a potential multi-miRNA sponge as it harbors ≥10 MREs for 140 miRNAs and has cytosolic localization. We further tested whether expression of CYP20A1_Alu-LT correlates with mRNAs harboring similar MRE targets. RNA-seq with conjoint miRNA-seq analysis was done in primary human neurons where we observed CYP20A1_Alu-LT to be downregulated during heat shock response and upregulated in HIV1-Tat treatment. 380 genes were positively correlated with its expression (significantly downregulated in heat shock and upregulated in Tat) and they harbored MREs for nine expressed miRNAs which were also enriched in CYP20A1_Alu-LT. MREs were significantly enriched in these 380 genes compared to random sets of differentially expressed genes (p = 8.134e-12). Gene ontology suggested involvement of these genes in neuronal development and hemostasis pathways thus proposing a novel component of Alu-miRNA mediated transcriptional modulation that could govern specific physiological outcomes in higher primates.

Download Full-text

Gene Expression Imputation with Generative Adversarial Imputation Nets

10.1101/2020.06.09.141689 ◽

2020 ◽

Author(s):

Ramon Viñas ◽

Tiago Azevedo ◽

Eric R. Gamazon ◽

Pietro Liò

Keyword(s):

Gene Expression ◽

Large Scale ◽

Biological Significance ◽

Predictive Performance ◽

Cost Effective ◽

Rna Seq ◽

Comprehensive Collection ◽

Genomic Studies ◽

Biological Discovery ◽

Cancer Types

AbstractA question of fundamental biological significance is to what extent the expression of a subset of genes can be used to recover the full transcriptome, with important implications for biological discovery and clinical application. To address this challenge, we present GAIN-GTEx, a method for gene expression imputation based on Generative Adversarial Imputation Networks. In order to increase the applicability of our approach, we leverage data from GTEx v8, a reference resource that has generated a comprehensive collection of transcriptomes from a diverse set of human tissues. We compare our model to several standard and state-of-the-art imputation methods and show that GAIN-GTEx is significantly superior in terms of predictive performance and runtime. Furthermore, our results indicate strong generalisation on RNA-Seq data from 3 cancer types across varying levels of missingness. Our work can facilitate a cost-effective integration of large-scale RNA biorepositories into genomic studies of disease, with high applicability across diverse tissue types.

Download Full-text

Automated Isoform Diversity Detector (AIDD): A pipeline for investigating transcriptome diversity of RNA-seq data

10.1101/2020.01.22.915348 ◽

2020 ◽

Author(s):

Noel-Marie Plonski ◽

Emily Johnson ◽

Madeline Frederick ◽

Heather Mercer ◽

Gail Fraizer ◽

...

Keyword(s):

Rna Editing ◽

Large Scale ◽

Neural Progenitor Cells ◽

Viral Infections ◽

Variant Calling ◽

Rna Seq ◽

Major Mechanism ◽

Isoform Diversity ◽

Adar Editing ◽

Transcriptome Diversity

AbstractBackgroundAs the number of RNA-seq datasets that become available to explore transcriptome diversity increases, so does the need for easy-to-use comprehensive computational workflows. Many available tools facilitate analyses of one of the two major mechanisms of transcriptome diversity, namely, differential expression of isoforms due to alternative splicing, while the second major mechanism - RNA editing due to post-transcriptional changes of individual nucleotides – remains under-appreciated. Both these mechanisms play an essential role in physiological and diseases processes, including cancer and neurological disorders. However, elucidation of RNA editing events at transcriptome-wide level requires increasingly complex computational tools, in turn resulting in a steep entrance barrier for labs who are interested in high-throughput variant calling applications on a large scale but lack the manpower and/or computational expertise.ResultsHere we present an easy-to-use, fully automated, computational pipeline (Automated Isoform Diversity Detector, AIDD) that contains open source tools for various tasks needed to map transcriptome diversity, including RNA editing events. To facilitate reproducibility and avoid system dependencies, the pipeline is contained within a pre-configured VirtualBox environment. The analytical tasks and format conversions are accomplished via a set of automated scripts that enable the user to go from a set of raw data, such as fastq files, to publication-ready results and figures in one step. A publicly available dataset of Zika virus-infected neural progenitor cells is used to illustrate AIDD’s capabilities.ConclusionsAIDD pipeline offers a user-friendly interface for comprehensive and reproducible RNA-seq analyses. Among unique features of AIDD are its ability to infer RNA editing patterns, including ADAR editing, and inclusion of Guttman scale patterns for time series analysis of such editing landscapes. AIDD-based results show importance of diversity of ADAR isoforms, key RNA editing enzymes linked with the innate immune system and viral infections. These findings offer insights into the potential role of ADAR editing dysregulation in the disease mechanisms, including those of congenital Zika syndrome. Because of its automated all-inclusive features, AIDD pipeline enables even a novice user to easily explore common mechanisms of transcriptome diversity, including RNA editing landscapes.

Download Full-text

Predicting heterogeneity in clone-specific therapeutic vulnerabilities using single-cell transcriptomic signatures

Genome Medicine ◽

10.1186/s13073-021-01000-y ◽

2021 ◽

Vol 13 (1) ◽

Author(s):

Chayaporn Suphavilai ◽

Shumei Chia ◽

Ankur Sharma ◽

Lorna Tu ◽

Rafael Peres Da Silva ◽

...

Keyword(s):

Single Cell ◽

Large Scale ◽

Drug Response ◽

Drug Repurposing ◽

High Accuracy ◽

Molecular Heterogeneity ◽

Precision Oncology ◽

Scale Analysis ◽

Rna Seq ◽

Large Scale Analysis

AbstractWhile understanding molecular heterogeneity across patients underpins precision oncology, there is increasing appreciation for taking intra-tumor heterogeneity into account. Based on large-scale analysis of cancer omics datasets, we highlight the importance of intra-tumor transcriptomic heterogeneity (ITTH) for predicting clinical outcomes. Leveraging single-cell RNA-seq (scRNA-seq) with a recommender system (CaDRReS-Sc), we show that heterogeneous gene-expression signatures can predict drug response with high accuracy (80%). Using patient-proximal cell lines, we established the validity of CaDRReS-Sc’s monotherapy (Pearson r>0.6) and combinatorial predictions targeting clone-specific vulnerabilities (>10% improvement). Applying CaDRReS-Sc to rapidly expanding scRNA-seq compendiums can serve as in silico screen to accelerate drug-repurposing studies. Availability: https://github.com/CSB5/CaDRReS-Sc.

Download Full-text

Maternal factor PABPN1L is essential for maternal mRNA degradation during maternal-to-zygotic transition

10.1101/2020.08.20.258830 ◽

2020 ◽

Author(s):

Ying Wang ◽

Tianhao Feng ◽

Xiaodan Shi ◽

Siyu Liu ◽

Zerui Wang ◽

...

Keyword(s):

Mouse Model ◽

Large Scale ◽

Mrna Degradation ◽

Female Infertility ◽

Control Group ◽

Rna Seq ◽

Cell Stage ◽

Maternal Mrna ◽

Maternal To Zygotic Transition ◽

Group A

AbstractInfertility affects 10% - 15% of families worldwide. However, the pathogenesis of female infertility caused by abnormal early embryonic development is not clear. We constructed a mouse model (Pabpn1l -/-) simulating the splicing abnormality of human PABPN1L and found that the female was sterile and the male was fertile. The Pabpn1l -/- oocytes can be produced, ovulated and fertilized normally, but cannot develop beyond the 2-cell stage. Using RNA-Seq, we found a large-scale upregulation of RNA in Pabpn1l -/- MII oocytes. Of the 2401 transcripts upregulated in Pabpn1l-/- MII oocytes, 1523 transcripts (63.4%) were also upregulated in Btg4 -/- MII oocytes, while only 53 transcripts (2.2%) were upregulated in Ythdf2 -/- MII oocytes. We documented that transcripts in zygotes derived from Pabpn1l -/- oocytes have a longer poly(A) tail than the control group, a phenomenon similar to that in Btg4-/- mice. Surprisingly, the poly(A) tail of these mRNAs was significantly shorter in the Pabpn1l -/- MII oocytes than in the Pabpn1l +/+. These results suggest that PABPN1L is involved in BTG4-mediated maternal mRNA degradation, and may antagonize poly(A) tail shortening in oocytes independently of its involvement in maternal mRNA degradation. Thus, PABPN1L variants could be a genetic marker of female infertility.

Download Full-text

Leveraging high-powered RNA-Seq datasets to improve inference of regulatory activity in single-cell RNA-Seq data

10.1101/553040 ◽

2019 ◽

Cited By ~ 1

Author(s):

Ning Wang ◽

Andrew E. Teschendorff

Keyword(s):

Transcription Factors ◽

Single Cell ◽

Cell Fate ◽

Regulatory Networks ◽

Large Scale ◽

Single Cells ◽

Differential Expression Analysis ◽

Dropout Rate ◽

Rna Seq ◽

Regulatory Activity

AbstractInferring the activity of transcription factors in single cells is a key task to improve our understanding of development and complex genetic diseases. This task is, however, challenging due to the relatively large dropout rate and noisy nature of single-cell RNA-Seq data. Here we present a novel statistical inference framework called SCIRA (Single Cell Inference of Regulatory Activity), which leverages the power of large-scale bulk RNA-Seq datasets to infer high-quality tissue-specific regulatory networks, from which regulatory activity estimates in single cells can be subsequently obtained. We show that SCIRA can correctly infer regulatory activity of transcription factors affected by high technical dropouts. In particular, SCIRA can improve sensitivity by as much as 70% compared to differential expression analysis and current state-of-the-art methods. Importantly, SCIRA can reveal novel regulators of cell-fate in tissue-development, even for cell-types that only make up 5% of the tissue, and can identify key novel tumor suppressor genes in cancer at single cell resolution. In summary, SCIRA will be an invaluable tool for single-cell studies aiming to accurately map activity patterns of key transcription factors during development, and how these are altered in disease.

Download Full-text

EpiScanpy: integrated single-cell epigenomic analysis

10.1101/648097 ◽

2019 ◽

Cited By ~ 4

Author(s):

Anna Danese ◽

Maria L. Richter ◽

David S. Fischer ◽

Fabian J. Theis ◽

Maria Colomé-Tatché

Keyword(s):

Dna Methylation ◽

Single Cell ◽

Large Scale ◽

Feature Space ◽

Rna Seq ◽

Computational Framework ◽

Learning Techniques ◽

Multiple Feature ◽

The Many ◽

Cell Data

ABSTRACTEpigenetic single-cell measurements reveal a layer of regulatory information not accessible to single-cell transcriptomics, however single-cell-omics analysis tools mainly focus on gene expression data. To address this issue, we present epiScanpy, a computational framework for the analysis of single-cell DNA methylation and single-cell ATAC-seq data. EpiScanpy makes the many existing RNA-seq workflows from scanpy available to large-scale single-cell data from other -omics modalities. We introduce and compare multiple feature space constructions for epigenetic data and show the feasibility of common clustering, dimension reduction and trajectory learning techniques. We benchmark epiScanpy by interrogating different single-cell brain mouse atlases of DNA methylation, ATAC-seq and transcriptomics. We find that differentially methylated and differentially open markers between cell clusters enrich transcriptome-based cell type labels by orthogonal epigenetic information.

Download Full-text

Dual RNA-seq reveals large-scale non-conserved genotype x genotype specific genetic reprograming and molecular crosstalk in the mycorrhizal symbiosis

10.1101/393637 ◽

2018 ◽

Author(s):

Ivan D. Mateus ◽

Frédéric G. Masclaux ◽

Consolée Aletti ◽

Edward C. Rojas ◽

Romain Savary ◽

...

Keyword(s):

Mycorrhizal Fungi ◽

Large Scale ◽

Molecular Mechanisms ◽

Arbuscular Mycorrhizal ◽

Mycorrhizal Symbiosis ◽

Rna Seq ◽

Plant Host ◽

Transcriptional Responses ◽

Plant Genes ◽

Fungal Genetic

AbstractArbuscular mycorrhizal fungi (AMF) impact plant growth and are a major driver of plant diversity and productivity. We quantified the contribution of intra-specific genetic variability in cassava (Manihot esculenta) and Rhizophagus irregularis to gene reprogramming in symbioses using dual RNA-sequencing. A large number of cassava genes exhibited altered transcriptional responses to the fungus but transcription of most of these plant genes (72%) responded in a different direction or magnitude depending on the plant genotype. Two AMF isolates displayed large differences in their transcription, but the direction and magnitude of the transcriptional responses for a large number of these genes was also strongly influenced by the genotype of the plant host. This indicates that unlike the highly conserved plant genes necessary for the symbiosis establishment, plant and fungal gene transcriptional responses are not conserved and are greatly influenced by plant and fungal genetic differences, even at the within-species level. The transcriptional variability detected allowed us to identify an extensive gene network showing the interplay in plant-fungal reprogramming in the symbiosis. Key genes illustrated that the two organisms jointly program their cytoskeleton organisation during growth of the fungus inside roots. Our study reveals that plant and fungal genetic variation plays a strong role in shaping the genetic reprograming in response to symbiosis, indicating considerable genotype x genotype interactions in the mycorrhizal symbiosis. Such variation needs to be considered in order to understand the molecular mechanisms between AMF and their plant hosts in natural communities.

Download Full-text