Optimized splitting of mixed-species RNA sequencing data

Gene expression studies using xenograft transplants or co-culture systems, usually with mixed human and mouse cells, have proven to be valuable to uncover cellular dynamics during development or in disease models. However, the mRNA sequence similarities among species presents a challenge for accurate transcript quantification. To identify optimal strategies for analyzing mixed-species RNA sequencing data, we evaluate both alignment-dependent and alignment-independent methods. Alignment of reads to a pooled reference index is effective, particularly if optimal alignments are used to classify sequencing reads by species, which are re-aligned with individual genomes, generating [Formula: see text] accuracy across a range of species ratios. Alignment-independent methods, such as convolutional neural networks, which extract the conserved patterns of sequences from two species, classify RNA sequencing reads with over 85% accuracy. Importantly, both methods perform well with different ratios of human and mouse reads. While non-alignment strategies successfully partitioned reads by species, a more traditional approach of mixed-genome alignment followed by optimized separation of reads proved to be the more successful with lower error rates.

Download Full-text

Optimized Splitting of RNA Sequencing Data by Species

10.1101/2021.06.09.447735 ◽

2021 ◽

Author(s):

Xuan Song ◽

Hai Yun Gao ◽

Karl Herrup ◽

Ronald P Hart

Keyword(s):

Rna Sequencing ◽

Optimal Strategies ◽

Sequencing Data ◽

Transcript Quantification ◽

Effective Strategies ◽

Expression Studies ◽

Gene Expression Studies ◽

Reference Index ◽

Optimal Alignments ◽

Sequence Similarities

Gene expression studies using chimeric xenograft transplants or co-culture systems have proven to be valuable to uncover cellular dynamics and interactions during development or in disease models. However, the mRNA sequence similarities among species presents a challenge for accurate transcript quantification. To identify optimal strategies for analyzing mixed-species RNA sequencing data, we evaluate both alignment-dependent and alignment-independent methods. Alignment of reads to a pooled reference index is effective, particularly if optimal alignments are used to classify sequencing reads by species, which are re-aligned with individual genomes, generating >97% accuracy across a range of species ratios. Alignment-independent methods, such as Convolutional Neural Networks, which extract the conserved patterns of sequences from two species, classify RNA sequencing reads with over 85% accuracy. Importantly, both methods perform well with different ratios of human and mouse reads. Our evaluation identifies valuable and effective strategies to dissect species composition of RNA sequencing data from mixed populations.

Download Full-text

Comparative assessment of long-read error correction software applied to Nanopore RNA-sequencing data

Briefings in Bioinformatics ◽

10.1093/bib/bbz058 ◽

2019 ◽

Vol 21 (4) ◽

pp. 1164-1181 ◽

Cited By ~ 9

Author(s):

Leandro Lima ◽

Camille Marchet ◽

Ségolène Caboche ◽

Corinne Da Silva ◽

Benjamin Istace ◽

...

Keyword(s):

Error Correction ◽

Rna Sequencing ◽

Gene Families ◽

Error Rates ◽

Open Reading Frames ◽

Sequencing Data ◽

Isoform Diversity ◽

Long Reads ◽

Long Read ◽

Read Error Correction

Abstract Motivation Nanopore long-read sequencing technology offers promising alternatives to high-throughput short read sequencing, especially in the context of RNA-sequencing. However this technology is currently hindered by high error rates in the output data that affect analyses such as the identification of isoforms, exon boundaries, open reading frames and creation of gene catalogues. Due to the novelty of such data, computational methods are still actively being developed and options for the error correction of Nanopore RNA-sequencing long reads remain limited. Results In this article, we evaluate the extent to which existing long-read DNA error correction methods are capable of correcting cDNA Nanopore reads. We provide an automatic and extensive benchmark tool that not only reports classical error correction metrics but also the effect of correction on gene families, isoform diversity, bias toward the major isoform and splice site detection. We find that long read error correction tools that were originally developed for DNA are also suitable for the correction of Nanopore RNA-sequencing data, especially in terms of increasing base pair accuracy. Yet investigators should be warned that the correction process perturbs gene family sizes and isoform diversity. This work provides guidelines on which (or whether) error correction tools should be used, depending on the application type. Benchmarking software https://gitlab.com/leoisl/LR_EC_analyser

Download Full-text

Comparative assessment of long-read error-correction software applied to RNA-sequencing data

10.1101/476622 ◽

2018 ◽

Cited By ~ 2

Author(s):

Leandro Lima ◽

Camille Marchet ◽

Ségolène Caboche ◽

Corinne Da Silva ◽

Benjamin Istace ◽

...

Keyword(s):

Error Correction ◽

Rna Sequencing ◽

Gene Families ◽

Error Rates ◽

Open Reading Frames ◽

Sequencing Data ◽

Sequencing Technologies ◽

Isoform Diversity ◽

Long Read ◽

Read Error Correction

AbstractMotivationLong-read sequencing technologies offer promising alternatives to high-throughput short read sequencing, especially in the context of RNA-sequencing. However these technologies are currently hindered by high error rates in the output data that affect analyses such as the identification of isoforms, exon boundaries, open reading frames, and the creation of gene catalogues. Due to the novelty of such data, computational methods are still actively being developed and options for the error-correction of RNA-sequencing long reads remain limited.ResultsIn this article, we evaluate the extent to which existing long-read DNA error correction methods are capable of correcting cDNA Nanopore reads. We provide an automatic and extensive benchmark tool that not only reports classical error-correction metrics but also the effect of correction on gene families, isoform diversity, bias towards the major isoform, and splice site detection. We find that long read error-correction tools that were originally developed for DNA are also suitable for the correction of RNA-sequencing data, especially in terms of increasing base-pair accuracy. Yet investigators should be warned that the correction process perturbs gene family sizes and isoform diversity. This work provides guidelines on which (or whether) error-correction tools should be used, depending on the application type.Benchmarking softwarehttps://gitlab.com/leoisl/LR_EC_analyser

Download Full-text

Comparative transcriptome profiling of the human and mouse dorsal root ganglia: An RNA-seq-based resource for pain and sensory neuroscience research

10.1101/165431 ◽

2017 ◽

Cited By ~ 1

Author(s):

Pradipta Ray ◽

Andrew Torck ◽

Lilyana Quigley ◽

Andi Wangzhou ◽

Matthew Neiman ◽

...

Keyword(s):

Chronic Pain ◽

Rna Sequencing ◽

Dorsal Root Ganglia ◽

Dorsal Root ◽

Expression Patterns ◽

Rna Seq ◽

Sequencing Data ◽

Mouse Tissues ◽

Human And Mouse ◽

Insight Into

AbstractMolecular neurobiological insight into human nervous tissues is needed to generate next generation therapeutics for neurological disorders like chronic pain. We obtained human Dorsal Root Ganglia (DRG) samples from organ donors and performed RNA-sequencing (RNA-seq) to study the human DRG (hDRG) transcriptional landscape, systematically comparing it with publicly available data from a variety of human and orthologous mouse tissues, including mouse DRG (mDRG). We characterized the hDRG transcriptional profile in terms of tissue-restricted gene co-expression patterns and putative transcriptional regulators, and formulated an information-theoretic framework to quantify DRG enrichment. Our analyses reveal an hDRG-enriched protein-coding gene set (~140), some of which have not been described in the context of DRG or pain signaling. A majority of these show conserved enrichment in mDRG, and were mined for known drug - gene product interactions. Comparison of hDRG and tibial nerve transcriptomes suggest pervasive mRNA transport of sensory neuronal genes to axons in adult hDRG, with potential implications for mechanistic insight into chronic pain in patients. Relevant gene families and pathways were also analyzed, including transcription factors (TFs), g-protein coupled receptors (GCPRs) and ion channels. We present our work as an online, searchable repository (http://www.utdallas.edu/bbs/painneurosciencelab/DRGtranscriptome), creating a valuable resource for the community. Our analyses provide insight into DRG biology for guiding development of novel therapeutics, and a blueprint for cross-species transcriptomic analyses.SummaryWe generated RNA sequencing data from human DRG samples and comprehensively compared this transcriptome to other human tissues and a matching panel of mouse tissues. Our analysis uncovered functionally enriched genes in the human and mouse DRG with important implications for understanding sensory biology and pain drug discovery.

Download Full-text

Leveraging transcript quantification for fast computation of alternative splicing profiles

10.1101/008763 ◽

2014 ◽

Cited By ~ 4

Author(s):

Gael P Alamancos ◽

Amadís Pagès ◽

Juan L Trincado ◽

Nicolás Bellora ◽

Eduardo Eyras

Keyword(s):

Alternative Splicing ◽

Rna Sequencing ◽

De Novo ◽

Computation Time ◽

Sequencing Data ◽

Standard Methods ◽

Transcript Quantification ◽

Cellular Processes ◽

Reconstruction Methods ◽

Transcript Reconstruction

Alternative splicing plays an essential role in many cellular processes and bears major relevance in the understanding of multiple diseases, including cancer. High-throughput RNA sequencing allows genome-wide analyses of splicing across multiple conditions. However, the increasing number of available datasets represents a major challenge in terms of computation time and storage requirements. We describe SUPPA, a computational tool to calculate relative inclusion values of alternative splicing events, exploiting fast transcript quantification. SUPPA accuracy is comparable and sometimes superior to standard methods using simulated as well as real RNA sequencing data compared to experimentally validated events. We assess the variability in terms of the choice of annotation and provide evidence that using complete transcripts rather than more transcripts per gene provides better estimates. Moreover, SUPPA coupled with de novo transcript reconstruction methods does not achieve accuracies as high as using quantification of known transcripts, but remains comparable to existing methods. Finally, we show that SUPPA is more than 1000 times faster than standard methods. Coupled with fast transcript quantification, SUPPA provides inclusion values at a much higher speed than existing methods without compromising accuracy, thereby facilitating the systematic splicing analysis of large datasets with limited computational resources. The software is implemented in Python 2.7 and is available under the MIT license at https://bitbucket.org/regulatorygenomicsupf/suppa

Download Full-text

HEDGEHOG/GLI Modulates the PRR11-SKA2 Bidirectional Transcription Unit in Lung Squamous Cell Carcinomas

Genes ◽

10.3390/genes12010120 ◽

2021 ◽

Vol 12 (1) ◽

pp. 120

Author(s):

Yiyun Sun ◽

Dandan Xu ◽

Chundong Zhang ◽

Yitao Wang ◽

Lian Zhang ◽

...

Keyword(s):

Rna Sequencing ◽

Squamous Cell ◽

Gene Pair ◽

Gene List ◽

Ectopic Expression ◽

Lung Squamous Cell Carcinoma ◽

Gene Set Enrichment Analysis ◽

The Cancer Genome Atlas ◽

Sequencing Data ◽

Gene Set

We previously demonstrated that proline-rich protein 11 (PRR11) and spindle and kinetochore associated 2 (SKA2) constituted a head-to-head gene pair driven by a prototypical bidirectional promoter. This gene pair synergistically promoted the development of non-small cell lung cancer. However, the signaling pathways leading to the ectopic expression of this gene pair remains obscure. In the present study, we first analyzed the lung squamous cell carcinoma (LSCC) relevant RNA sequencing data from The Cancer Genome Atlas (TCGA) database using the correlation analysis of gene expression and gene set enrichment analysis (GSEA), which revealed that the PRR11-SKA2 correlated gene list highly resembled the Hedgehog (Hh) pathway activation-related gene set. Subsequently, GLI1/2 inhibitor GANT-61 or GLI1/2-siRNA inhibited the Hh pathway of LSCC cells, concomitantly decreasing the expression levels of PRR11 and SKA2. Furthermore, the mRNA expression profile of LSCC cells treated with GANT-61 was detected using RNA sequencing, displaying 397 differentially expressed genes (203 upregulated genes and 194 downregulated genes). Out of them, one gene set, including BIRC5, NCAPG, CCNB2, and BUB1, was involved in cell division and interacted with both PRR11 and SKA2. These genes were verified as the downregulated genes via RT-PCR and their high expression significantly correlated with the shorter overall survival of LSCC patients. Taken together, our results indicate that GLI1/2 mediates the expression of the PRR11-SKA2-centric gene set that serves as an unfavorable prognostic indicator for LSCC patients, potentializing new combinatorial diagnostic and therapeutic strategies in LSCC.

Download Full-text

RNA Sequencing Data from Human Intracranial Aneurysm Tissue Reveals a Complex Inflammatory Environment Associated with Rupture

Molecular Diagnosis & Therapy ◽

10.1007/s40291-021-00552-4 ◽

2021 ◽

Author(s):

Vincent M. Tutino ◽

Haley R. Zebraski ◽

Hamidreza Rajabzadeh-Oghaz ◽

Lee Chaves ◽

Adam A. Dmytriw ◽

...

Keyword(s):

Intracranial Aneurysm ◽

Rna Sequencing ◽

Sequencing Data

Download Full-text

Mixed Distribution Models Based on Single-Cell RNA Sequencing Data

Interdisciplinary Sciences Computational Life Sciences ◽

10.1007/s12539-021-00427-6 ◽

2021 ◽

Author(s):

Min Wu ◽

Junhua Xu ◽

Tao Ding ◽

Jie Gao

Keyword(s):

Single Cell ◽

Rna Sequencing ◽

Sequencing Data ◽

Distribution Models ◽

Mixed Distribution ◽

Single Cell Rna Sequencing

Download Full-text

In-depth transcriptomic analysis of human retina reveals molecular mechanisms underlying diabetic retinopathy

Scientific Reports ◽

10.1038/s41598-021-88698-3 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Kolja Becker ◽

Holger Klein ◽

Eric Simon ◽

Coralie Viollet ◽

Christian Haslinger ◽

...

Keyword(s):

Diabetic Retinopathy ◽

Rna Sequencing ◽

Molecular Mechanisms ◽

Vision Loss ◽

Ganglion Cells ◽

Expression Profiles ◽

Cell Types ◽

Sequencing Data ◽

Disease Stages ◽

Post Transcriptional Regulation

AbstractDiabetic Retinopathy (DR) is among the major global causes for vision loss. With the rise in diabetes prevalence, an increase in DR incidence is expected. Current understanding of both the molecular etiology and pathways involved in the initiation and progression of DR is limited. Via RNA-Sequencing, we analyzed mRNA and miRNA expression profiles of 80 human post-mortem retinal samples from 43 patients diagnosed with various stages of DR. We found differentially expressed transcripts to be predominantly associated with late stage DR and pathways such as hippo and gap junction signaling. A multivariate regression model identified transcripts with progressive changes throughout disease stages, which in turn displayed significant overlap with sphingolipid and cGMP–PKG signaling. Combined analysis of miRNA and mRNA expression further uncovered disease-relevant miRNA/mRNA associations as potential mechanisms of post-transcriptional regulation. Finally, integrating human retinal single cell RNA-Sequencing data revealed a continuous loss of retinal ganglion cells, and Müller cell mediated changes in histidine and β-alanine signaling. While previously considered primarily a vascular disease, attention in DR has shifted to additional mechanisms and cell-types. Our findings offer an unprecedented and unbiased insight into molecular pathways and cell-specific changes in the development of DR, and provide potential avenues for future therapeutic intervention.

Download Full-text

COVID-19 Severity Potentially Modulated by Cardiovascular-Disease-Associated Immune Dysregulation

Viruses ◽

10.3390/v13061018 ◽

2021 ◽

Vol 13 (6) ◽

pp. 1018

Author(s):

Abby C. Lee ◽

Grant Castaneda ◽

Wei Tse Li ◽

Chengyu Chen ◽

Neil Shende ◽

...

Keyword(s):

Rna Sequencing ◽

Mononuclear Cells ◽

Human Peripheral Blood ◽

Expression Profiles ◽

Immune Dysregulation ◽

Left Ventricular ◽

Recurrent Event ◽

Healthy Controls ◽

Sequencing Data ◽

Peripheral Blood Mononuclear

Patients with underlying cardiovascular conditions are particularly vulnerable to severe COVID-19. In this project, we aimed to characterize similarities in dysregulated immune pathways between COVID-19 patients and patients with cardiomyopathy, venous thromboembolism (VTE), or coronary artery disease (CAD). We hypothesized that these similarly dysregulated pathways may be critical to how cardiovascular diseases (CVDs) exacerbate COVID-19. To evaluate immune dysregulation in different diseases, we used four separate datasets, including RNA-sequencing data from human left ventricular cardiac muscle samples of patients with dilated or ischemic cardiomyopathy and healthy controls; RNA-sequencing data of whole blood samples from patients with single or recurrent event VTE and healthy controls; RNA-sequencing data of human peripheral blood mononuclear cells (PBMCs) from patients with and without obstructive CAD; and RNA-sequencing data of platelets from COVID-19 subjects and healthy controls. We found similar immune dysregulation profiles between patients with CVDs and COVID-19 patients. Interestingly, cardiomyopathy patients display the most similar immune landscape to COVID-19 patients. Additionally, COVID-19 patients experience greater upregulation of cytokine- and inflammasome-related genes than patients with CVDs. In all, patients with CVDs have a significant overlap of cytokine- and inflammasome-related gene expression profiles with that of COVID-19 patients, possibly explaining their greater vulnerability to severe COVID-19.

Download Full-text