Optimized splitting of mixed-species RNA sequencing data

Author(s):  
Xuan Song ◽  
Hai Yun Gao ◽  
Karl Herrup ◽  
Ronald P. Hart

Gene expression studies using xenograft transplants or co-culture systems, usually with mixed human and mouse cells, have proven to be valuable to uncover cellular dynamics during development or in disease models. However, the mRNA sequence similarities among species presents a challenge for accurate transcript quantification. To identify optimal strategies for analyzing mixed-species RNA sequencing data, we evaluate both alignment-dependent and alignment-independent methods. Alignment of reads to a pooled reference index is effective, particularly if optimal alignments are used to classify sequencing reads by species, which are re-aligned with individual genomes, generating [Formula: see text] accuracy across a range of species ratios. Alignment-independent methods, such as convolutional neural networks, which extract the conserved patterns of sequences from two species, classify RNA sequencing reads with over 85% accuracy. Importantly, both methods perform well with different ratios of human and mouse reads. While non-alignment strategies successfully partitioned reads by species, a more traditional approach of mixed-genome alignment followed by optimized separation of reads proved to be the more successful with lower error rates.

2021 ◽  
Author(s):  
Xuan Song ◽  
Hai Yun Gao ◽  
Karl Herrup ◽  
Ronald P Hart

Gene expression studies using chimeric xenograft transplants or co-culture systems have proven to be valuable to uncover cellular dynamics and interactions during development or in disease models. However, the mRNA sequence similarities among species presents a challenge for accurate transcript quantification. To identify optimal strategies for analyzing mixed-species RNA sequencing data, we evaluate both alignment-dependent and alignment-independent methods. Alignment of reads to a pooled reference index is effective, particularly if optimal alignments are used to classify sequencing reads by species, which are re-aligned with individual genomes, generating >97% accuracy across a range of species ratios. Alignment-independent methods, such as Convolutional Neural Networks, which extract the conserved patterns of sequences from two species, classify RNA sequencing reads with over 85% accuracy. Importantly, both methods perform well with different ratios of human and mouse reads. Our evaluation identifies valuable and effective strategies to dissect species composition of RNA sequencing data from mixed populations.


2019 ◽  
Vol 21 (4) ◽  
pp. 1164-1181 ◽  
Author(s):  
Leandro Lima ◽  
Camille Marchet ◽  
Ségolène Caboche ◽  
Corinne Da Silva ◽  
Benjamin Istace ◽  
...  

Abstract Motivation Nanopore long-read sequencing technology offers promising alternatives to high-throughput short read sequencing, especially in the context of RNA-sequencing. However this technology is currently hindered by high error rates in the output data that affect analyses such as the identification of isoforms, exon boundaries, open reading frames and creation of gene catalogues. Due to the novelty of such data, computational methods are still actively being developed and options for the error correction of Nanopore RNA-sequencing long reads remain limited. Results In this article, we evaluate the extent to which existing long-read DNA error correction methods are capable of correcting cDNA Nanopore reads. We provide an automatic and extensive benchmark tool that not only reports classical error correction metrics but also the effect of correction on gene families, isoform diversity, bias toward the major isoform and splice site detection. We find that long read error correction tools that were originally developed for DNA are also suitable for the correction of Nanopore RNA-sequencing data, especially in terms of increasing base pair accuracy. Yet investigators should be warned that the correction process perturbs gene family sizes and isoform diversity. This work provides guidelines on which (or whether) error correction tools should be used, depending on the application type. Benchmarking software https://gitlab.com/leoisl/LR_EC_analyser


2018 ◽  
Author(s):  
Leandro Lima ◽  
Camille Marchet ◽  
Ségolène Caboche ◽  
Corinne Da Silva ◽  
Benjamin Istace ◽  
...  

AbstractMotivationLong-read sequencing technologies offer promising alternatives to high-throughput short read sequencing, especially in the context of RNA-sequencing. However these technologies are currently hindered by high error rates in the output data that affect analyses such as the identification of isoforms, exon boundaries, open reading frames, and the creation of gene catalogues. Due to the novelty of such data, computational methods are still actively being developed and options for the error-correction of RNA-sequencing long reads remain limited.ResultsIn this article, we evaluate the extent to which existing long-read DNA error correction methods are capable of correcting cDNA Nanopore reads. We provide an automatic and extensive benchmark tool that not only reports classical error-correction metrics but also the effect of correction on gene families, isoform diversity, bias towards the major isoform, and splice site detection. We find that long read error-correction tools that were originally developed for DNA are also suitable for the correction of RNA-sequencing data, especially in terms of increasing base-pair accuracy. Yet investigators should be warned that the correction process perturbs gene family sizes and isoform diversity. This work provides guidelines on which (or whether) error-correction tools should be used, depending on the application type.Benchmarking softwarehttps://gitlab.com/leoisl/LR_EC_analyser


2017 ◽  
Author(s):  
Pradipta Ray ◽  
Andrew Torck ◽  
Lilyana Quigley ◽  
Andi Wangzhou ◽  
Matthew Neiman ◽  
...  

AbstractMolecular neurobiological insight into human nervous tissues is needed to generate next generation therapeutics for neurological disorders like chronic pain. We obtained human Dorsal Root Ganglia (DRG) samples from organ donors and performed RNA-sequencing (RNA-seq) to study the human DRG (hDRG) transcriptional landscape, systematically comparing it with publicly available data from a variety of human and orthologous mouse tissues, including mouse DRG (mDRG). We characterized the hDRG transcriptional profile in terms of tissue-restricted gene co-expression patterns and putative transcriptional regulators, and formulated an information-theoretic framework to quantify DRG enrichment. Our analyses reveal an hDRG-enriched protein-coding gene set (~140), some of which have not been described in the context of DRG or pain signaling. A majority of these show conserved enrichment in mDRG, and were mined for known drug - gene product interactions. Comparison of hDRG and tibial nerve transcriptomes suggest pervasive mRNA transport of sensory neuronal genes to axons in adult hDRG, with potential implications for mechanistic insight into chronic pain in patients. Relevant gene families and pathways were also analyzed, including transcription factors (TFs), g-protein coupled receptors (GCPRs) and ion channels. We present our work as an online, searchable repository (http://www.utdallas.edu/bbs/painneurosciencelab/DRGtranscriptome), creating a valuable resource for the community. Our analyses provide insight into DRG biology for guiding development of novel therapeutics, and a blueprint for cross-species transcriptomic analyses.SummaryWe generated RNA sequencing data from human DRG samples and comprehensively compared this transcriptome to other human tissues and a matching panel of mouse tissues. Our analysis uncovered functionally enriched genes in the human and mouse DRG with important implications for understanding sensory biology and pain drug discovery.


2014 ◽  
Author(s):  
Gael P Alamancos ◽  
Amadís Pagès ◽  
Juan L Trincado ◽  
Nicolás Bellora ◽  
Eduardo Eyras

Alternative splicing plays an essential role in many cellular processes and bears major relevance in the understanding of multiple diseases, including cancer. High-throughput RNA sequencing allows genome-wide analyses of splicing across multiple conditions. However, the increasing number of available datasets represents a major challenge in terms of computation time and storage requirements. We describe SUPPA, a computational tool to calculate relative inclusion values of alternative splicing events, exploiting fast transcript quantification. SUPPA accuracy is comparable and sometimes superior to standard methods using simulated as well as real RNA sequencing data compared to experimentally validated events. We assess the variability in terms of the choice of annotation and provide evidence that using complete transcripts rather than more transcripts per gene provides better estimates. Moreover, SUPPA coupled with de novo transcript reconstruction methods does not achieve accuracies as high as using quantification of known transcripts, but remains comparable to existing methods. Finally, we show that SUPPA is more than 1000 times faster than standard methods. Coupled with fast transcript quantification, SUPPA provides inclusion values at a much higher speed than existing methods without compromising accuracy, thereby facilitating the systematic splicing analysis of large datasets with limited computational resources. The software is implemented in Python 2.7 and is available under the MIT license at https://bitbucket.org/regulatorygenomicsupf/suppa


Genes ◽  
2021 ◽  
Vol 12 (1) ◽  
pp. 120
Author(s):  
Yiyun Sun ◽  
Dandan Xu ◽  
Chundong Zhang ◽  
Yitao Wang ◽  
Lian Zhang ◽  
...  

We previously demonstrated that proline-rich protein 11 (PRR11) and spindle and kinetochore associated 2 (SKA2) constituted a head-to-head gene pair driven by a prototypical bidirectional promoter. This gene pair synergistically promoted the development of non-small cell lung cancer. However, the signaling pathways leading to the ectopic expression of this gene pair remains obscure. In the present study, we first analyzed the lung squamous cell carcinoma (LSCC) relevant RNA sequencing data from The Cancer Genome Atlas (TCGA) database using the correlation analysis of gene expression and gene set enrichment analysis (GSEA), which revealed that the PRR11-SKA2 correlated gene list highly resembled the Hedgehog (Hh) pathway activation-related gene set. Subsequently, GLI1/2 inhibitor GANT-61 or GLI1/2-siRNA inhibited the Hh pathway of LSCC cells, concomitantly decreasing the expression levels of PRR11 and SKA2. Furthermore, the mRNA expression profile of LSCC cells treated with GANT-61 was detected using RNA sequencing, displaying 397 differentially expressed genes (203 upregulated genes and 194 downregulated genes). Out of them, one gene set, including BIRC5, NCAPG, CCNB2, and BUB1, was involved in cell division and interacted with both PRR11 and SKA2. These genes were verified as the downregulated genes via RT-PCR and their high expression significantly correlated with the shorter overall survival of LSCC patients. Taken together, our results indicate that GLI1/2 mediates the expression of the PRR11-SKA2-centric gene set that serves as an unfavorable prognostic indicator for LSCC patients, potentializing new combinatorial diagnostic and therapeutic strategies in LSCC.


Author(s):  
Vincent M. Tutino ◽  
Haley R. Zebraski ◽  
Hamidreza Rajabzadeh-Oghaz ◽  
Lee Chaves ◽  
Adam A. Dmytriw ◽  
...  

2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Kolja Becker ◽  
Holger Klein ◽  
Eric Simon ◽  
Coralie Viollet ◽  
Christian Haslinger ◽  
...  

AbstractDiabetic Retinopathy (DR) is among the major global causes for vision loss. With the rise in diabetes prevalence, an increase in DR incidence is expected. Current understanding of both the molecular etiology and pathways involved in the initiation and progression of DR is limited. Via RNA-Sequencing, we analyzed mRNA and miRNA expression profiles of 80 human post-mortem retinal samples from 43 patients diagnosed with various stages of DR. We found differentially expressed transcripts to be predominantly associated with late stage DR and pathways such as hippo and gap junction signaling. A multivariate regression model identified transcripts with progressive changes throughout disease stages, which in turn displayed significant overlap with sphingolipid and cGMP–PKG signaling. Combined analysis of miRNA and mRNA expression further uncovered disease-relevant miRNA/mRNA associations as potential mechanisms of post-transcriptional regulation. Finally, integrating human retinal single cell RNA-Sequencing data revealed a continuous loss of retinal ganglion cells, and Müller cell mediated changes in histidine and β-alanine signaling. While previously considered primarily a vascular disease, attention in DR has shifted to additional mechanisms and cell-types. Our findings offer an unprecedented and unbiased insight into molecular pathways and cell-specific changes in the development of DR, and provide potential avenues for future therapeutic intervention.


Viruses ◽  
2021 ◽  
Vol 13 (6) ◽  
pp. 1018
Author(s):  
Abby C. Lee ◽  
Grant Castaneda ◽  
Wei Tse Li ◽  
Chengyu Chen ◽  
Neil Shende ◽  
...  

Patients with underlying cardiovascular conditions are particularly vulnerable to severe COVID-19. In this project, we aimed to characterize similarities in dysregulated immune pathways between COVID-19 patients and patients with cardiomyopathy, venous thromboembolism (VTE), or coronary artery disease (CAD). We hypothesized that these similarly dysregulated pathways may be critical to how cardiovascular diseases (CVDs) exacerbate COVID-19. To evaluate immune dysregulation in different diseases, we used four separate datasets, including RNA-sequencing data from human left ventricular cardiac muscle samples of patients with dilated or ischemic cardiomyopathy and healthy controls; RNA-sequencing data of whole blood samples from patients with single or recurrent event VTE and healthy controls; RNA-sequencing data of human peripheral blood mononuclear cells (PBMCs) from patients with and without obstructive CAD; and RNA-sequencing data of platelets from COVID-19 subjects and healthy controls. We found similar immune dysregulation profiles between patients with CVDs and COVID-19 patients. Interestingly, cardiomyopathy patients display the most similar immune landscape to COVID-19 patients. Additionally, COVID-19 patients experience greater upregulation of cytokine- and inflammasome-related genes than patients with CVDs. In all, patients with CVDs have a significant overlap of cytokine- and inflammasome-related gene expression profiles with that of COVID-19 patients, possibly explaining their greater vulnerability to severe COVID-19.


Sign in / Sign up

Export Citation Format

Share Document