Comparative Analysis of Public RNA-Sequencing Data from Human Intestinal Enteroid (HIEs) Infected with Enteric RNA Viruses Identifies Universal and Virus-Specific Epithelial Responses

Acute gastroenteritis (AGE) has a significant disease burden on society. Noroviruses, rotaviruses, and astroviruses are important viral causes of AGE but are relatively understudied enteric pathogens. Recent developments in novel biomimetic human models of enteric disease are opening new possibilities for studying human-specific host–microbe interactions. Human intestinal enteroids (HIE), which are epithelium-only intestinal organoids derived from stem cells isolated from human intestinal biopsy tissues, have been successfully used to culture representative norovirus, rotavirus, and astrovirus strains. Previous studies investigated host–virus interactions at the intestinal epithelial interface by individually profiling the epithelial transcriptional response to a member of each virus family by RNA sequencing (RNA-seq). Despite differences in the tissue origin, enteric virus used, and hours post infection at which RNA was collected in each data set, the uniform analysis of publicly available datasets identified a conserved epithelial response to virus infection focused around “type I interferon production” and interferon-stimulated genes. Additionally, transcriptional changes specific to only one or two of the enteric viruses were also identified. This study can guide future explorations into common and unique aspects of the host response to virus infections in the human intestinal epithelium and demonstrates the promise of comparative RNA-seq analysis, even if performed under different experimental conditions, to discover universal and virus-specific genes and pathways responsible for antiviral host defense.

Download Full-text

Comparative analysis of public RNA-sequencing data from human intestinal enteroid (HIEs) infected with enteric RNA viruses identifies universal and virus-specific epithelial responses

10.1101/2021.03.30.437726 ◽

2021 ◽

Author(s):

Roberto J Cieza ◽

Jonathan Louis Golob ◽

Justin A Colacino ◽

Christiane E Wobus

Keyword(s):

Rna Sequencing ◽

Transcriptional Response ◽

Enteric Virus ◽

Intestinal Biopsy ◽

Virus Infections ◽

Sequencing Data ◽

Virus Family ◽

Host Microbe Interactions ◽

Host Virus Interactions ◽

Human Intestinal Epithelium

Acute gastroenteritis (AGE) has a significant disease burden on society. Noroviruses, rotaviruses and astroviruses are important viral causes of AGE but are relatively understudied enteric pathogens. Recent developments in novel biomimetic human models of enteric disease are opening new possibilities for studying human-specific host-microbe interactions. Human intestinal enteroids (HIE), which are epithelium-only intestinal organoids derived from stem cells isolated from human intestinal biopsy tissues, have been successfully used to culture representative norovirus, rotavirus and astrovirus strains. Previous studies investigated host-virus interactions at the intestinal epithelial interface by individually profiling the epithelial transcriptional response to a member of each virus family by RNA sequencing (RNA-seq). We used these publicly available datasets to uniformly analyze these data and identify shared and unique transcriptional changes in the human intestinal epithelium upon human enteric virus infections.

Download Full-text

OUTRIDER: A statistical method for detecting aberrantly expressed genes in RNA sequencing data

10.1101/322149 ◽

2018 ◽

Cited By ~ 2

Author(s):

Felix Brechtmann ◽

Agnė Matusevičiūtė ◽

Christian Mertes ◽

Vicente A Yépez ◽

Žiga Avsec ◽

...

Keyword(s):

Gene Expression ◽

Rna Sequencing ◽

Negative Binomial ◽

Statistical Significance ◽

P Value ◽

Rna Seq ◽

Sequencing Data ◽

Data Set ◽

Aberrant Gene Expression ◽

Aberrant Gene

AbstractRNA sequencing (RNA-seq) is gaining popularity as a complementary assay to genome sequencing for precisely identifying the molecular causes of rare disorders. A powerful approach is to identify aberrant gene expression levels as potential pathogenic events. However, existing methods for detecting aberrant read counts in RNA-seq data either lack assessments of statistical significance, so that establishing cutoffs is arbitrary, or rely on subjective manual corrections for confounders. Here, we describe OUTRIDER (OUTlier in RNA-seq fInDER), an algorithm developed to address these issues. The algorithm uses an autoencoder to model read count expectations according to the co-variation among genes resulting from technical, environmental, or common genetic variations. Given these expectations, the RNA-seq read counts are assumed to follow a negative binomial distribution with a gene-specific dispersion. Outliers are then identified as read counts that significantly deviate from this distribution. The model is automatically fitted to achieve the best correction of artificially corrupted data. Precision–recall analyses using simulated outlier read counts demonstrated the importance of combining correction for co-variation and significance-based thresholds. OUTRIDER is open source and includes functions for filtering out genes not expressed in a data set, for identifying outlier samples with too many aberrantly expressed genes, and for the P-value-based detection of aberrant gene expression, with false discovery rate adjustment. Overall, OUTRIDER provides a computationally fast and scalable end-to-end solution for identifying aberrantly expressed genes, suitable for use by rare disease diagnostic platforms.

Download Full-text

Combining DGE and RNA-sequencing data to identify new polyA+ non-coding transcripts in the human genome

Nucleic Acids Research ◽

10.1093/nar/gkt1300 ◽

2013 ◽

Vol 42 (5) ◽

pp. 2820-2832 ◽

Cited By ~ 14

Author(s):

Nicolas Philippe ◽

Elias Bou Samra ◽

Anthony Boureux ◽

Alban Mancheron ◽

Florence Rufflé ◽

...

Keyword(s):

Human Genome ◽

Rna Sequencing ◽

Dynamic Range ◽

Tiling Array ◽

Expression Data ◽

Rna Seq ◽

Sequencing Data ◽

Data Set ◽

Protein Coding ◽

Protein Coding Genes

Abstract Recent sequencing technologies that allow massive parallel production of short reads are the method of choice for transcriptome analysis. Particularly, digital gene expression (DGE) technologies produce a large dynamic range of expression data by generating short tag signatures for each cell transcript. These tags can be mapped back to a reference genome to identify new transcribed regions that can be further covered by RNA-sequencing (RNA-Seq) reads. Here, we applied an integrated bioinformatics approach that combines DGE tags, RNA-Seq, tiling array expression data and species-comparison to explore new transcriptional regions and their specific biological features, particularly tissue expression or conservation. We analysed tags from a large DGE data set (designated as ‘TranscriRef’). We then annotated 750 000 tags that were uniquely mapped to the human genome according to Ensembl. We retained transcripts originating from both DNA strands and categorized tags corresponding to protein-coding genes, antisense, intronic- or intergenic-transcribed regions and computed their overlap with annotated non-coding transcripts. Using this bioinformatics approach, we identified ∼34 000 novel transcribed regions located outside the boundaries of known protein-coding genes. As demonstrated using sequencing data from human pluripotent stem cells for biological validation, the method could be easily applied for the selection of tissue-specific candidate transcripts. DigitagCT is available at http://cractools.gforge.inria.fr/softwares/digitagct.

Download Full-text

Statistical inference of differential RNA-editing sites from RNA-sequencing data by hierarchical modeling

Bioinformatics ◽

10.1093/bioinformatics/btaa066 ◽

2020 ◽

Vol 36 (9) ◽

pp. 2796-2804 ◽

Cited By ~ 3

Author(s):

Stephen S Tran ◽

Qing Zhou ◽

Xinshu Xiao

Keyword(s):

Rna Sequencing ◽

Rna Editing ◽

Type I Error ◽

Hierarchical Modeling ◽

False Positive Rate ◽

Brain Regions ◽

Type I ◽

Rna Seq ◽

Sequencing Data ◽

Technical Features

Abstract Motivation RNA-sequencing (RNA-seq) enables global identification of RNA-editing sites in biological systems and disease. A salient step in many studies is to identify editing sites that statistically associate with treatment (e.g. case versus control) or covary with biological factors, such as age. However, RNA-seq has technical features that incumbent tests (e.g. t-test and linear regression) do not consider, which can lead to false positives and false negatives. Results In this study, we demonstrate the limitations of currently used tests and introduce the method, RNA-editing tests (REDITs), a suite of tests that employ beta-binomial models to identify differential RNA editing. The tests in REDITs have higher sensitivity than other tests, while also maintaining the type I error (false positive) rate at the nominal level. Applied to the GTEx dataset, we unveil RNA-editing changes associated with age and gender, and differential recoding profiles between brain regions. Availability and implementation REDITs are implemented as functions in R and freely available for download at https://github.com/gxiaolab/REDITs. The repository also provides a code example for leveraging parallelization using multiple cores.

Download Full-text

A Two-Stage Poisson Model for Testing RNA-Seq Data

Statistical Applications in Genetics and Molecular Biology ◽

10.2202/1544-6115.1627 ◽

2011 ◽

Vol 10 (1) ◽

Cited By ~ 39

Author(s):

Paul L. Auer ◽

Rebecca W Doerge

Keyword(s):

Rna Sequencing ◽

Statistical Approach ◽

Poisson Model ◽

Real Data ◽

Rna Seq ◽

Sequencing Data ◽

Sequencing Technology ◽

Two Stage ◽

Individual Gene ◽

Unique Nature

RNA sequencing technology is providing data of unprecedented throughput, resolution, and accuracy. Although there are many different computational tools for processing these data, there are a limited number of statistical methods for analyzing them, and even fewer that acknowledge the unique nature of individual gene transcription. We introduce a simple and powerful statistical approach, based on a two-stage Poisson model, for modeling RNA sequencing data and testing for biologically important changes in gene expression. The advantages of this approach are demonstrated through simulations and real data applications.

Download Full-text

Bioinformatic Dissecting of TP53 Regulation Pathway Underlying Butyrate-induced Histone Modification in Epigenetic Regulation

Genetics & Epigenetics ◽

10.4137/geg.s14176 ◽

2014 ◽

Vol 6 ◽

pp. GEG.S14176 ◽

Cited By ~ 4

Author(s):

Cong-Jun Li ◽

Robert W. Li

Keyword(s):

Data Mining ◽

Rna Sequencing ◽

Sequencing Data ◽

Data Set ◽

Mechanistic Pathway ◽

Genes Encoding ◽

Pathways Analysis ◽

Butyrate Treatment ◽

Downstream Analysis ◽

Tp53 Pathway

Butyrate affects cell proliferation, differentiation, and motility. Butyrate inhibits histone deacetylase (HDAC) activities and induces cell-cycle arrest and apoptosis. TP53 is one of the most active upstream regulators discovered by ingenuity pathways analysis (IPA) in our RNA-sequencing data set. TP53 signaling pathway plays key role in many cellular processes. TP53 pathway and their involvement in cellular functions modified by butyrate treatment were scrutinized in this report by data mining the RNA-sequencing data using IPA (Ingenuity System®). The TP53 mechanistic pathway targets more than 600 genes. Downstream analysis predicted the activation of the TP53 pathway after butyrate treatment. The data mining also revealed that nine transcription factors are downstream regulators in TP53 signaling pathways. The analysis results also indicated that butyrate not only inhibits the HDAC activities, but also regulates genes encoding the HDAC enzymes through modification of histones and epigenomic landscape.

Download Full-text

SimFuse: A Novel Fusion Simulator for RNA Sequencing (RNA-Seq) Data

BioMed Research International ◽

10.1155/2015/780519 ◽

2015 ◽

Vol 2015 ◽

pp. 1-5 ◽

Cited By ~ 2

Author(s):

Yuxiang Tan ◽

Yann Tambouret ◽

Stefano Monti

Keyword(s):

Sample Size ◽

Rna Sequencing ◽

High Throughput Sequencing ◽

Performance Metrics ◽

Simulated Data ◽

Real Data ◽

Rna Seq ◽

Sequencing Data ◽

Detection Algorithms ◽

Fusion Detection

The performance evaluation of fusion detection algorithms from high-throughput sequencing data crucially relies on the availability of data with known positive and negative cases of gene rearrangements. The use of simulated data circumvents some shortcomings of real data by generation of an unlimited number of true and false positive events, and the consequent robust estimation of accuracy measures, such as precision and recall. Although a few simulated fusion datasets from RNA Sequencing (RNA-Seq) are available, they are of limited sample size. This makes it difficult to systematically evaluate the performance of RNA-Seq based fusion-detection algorithms. Here, we present SimFuse to address this problem. SimFuse utilizes real sequencing data as the fusions’ background to closely approximate the distribution of reads from a real sequencing library and uses a reference genome as the template from which to simulate fusions’ supporting reads. To assess the supporting read-specific performance, SimFuse generates multiple datasets with various numbers of fusion supporting reads. Compared to an extant simulated dataset, SimFuse gives users control over the supporting read features and the sample size of the simulated library, based on which the performance metrics needed for the validation and comparison of alternative fusion-detection algorithms can be rigorously estimated.

Download Full-text

SSCC: a novel computational framework for rapid and accurate clustering large single cell RNA-seq data

10.1101/344242 ◽

2018 ◽

Cited By ~ 2

Author(s):

Xianwen Ren ◽

Liangtao Zheng ◽

Zemin Zhang

Keyword(s):

Single Cell ◽

Rna Sequencing ◽

Large Scale ◽

Random Projection ◽

Rna Seq ◽

Sequencing Data ◽

Computational Framework ◽

Human Blood Cells ◽

Single Cell Rna Sequencing ◽

Data Volume

ABSTRACTClustering is a prevalent analytical means to analyze single cell RNA sequencing data but the rapidly expanding data volume can make this process computational challenging. New methods for both accurate and efficient clustering are of pressing needs. Here we proposed a new clustering framework based on random projection and feature construction for large scale single-cell RNA sequencing data, which greatly improves clustering accuracy, robustness and computational efficacy for various state-of-the-art algorithms benchmarked on multiple real datasets. On a dataset with 68,578 human blood cells, our method reached 20% improvements for clustering accuracy and 50-fold acceleration but only consumed 66% memory usage compared to the widely-used software package SC3. Compared to k-means, the accuracy improvement can reach 3-fold depending on the concrete dataset. An R implementation of the framework is available from https://github.com/Japrin/sscClust.

Download Full-text

Splatter: simulation of single-cell RNA sequencing data

10.1101/133173 ◽

2017 ◽

Cited By ~ 8

Author(s):

Luke Zappia ◽

Belinda Phipson ◽

Alicia Oshlack

Keyword(s):

Single Cell ◽

Rna Sequencing ◽

Real Data ◽

Cell Types ◽

Rna Seq ◽

Sequencing Data ◽

Sequencing Technologies ◽

Simulation Based ◽

Single Cell Rna Sequencing ◽

Multiple Cell

AbstractAs single-cell RNA sequencing technologies have rapidly developed, so have analysis methods. Many methods have been tested, developed and validated using simulated datasets. Unfortunately, current simulations are often poorly documented, their similarity to real data is not demonstrated, or reproducible code is not available.Here we present the Splatter Bioconductor package for simple, reproducible and well-documented simulation of single-cell RNA-seq data. Splatter provides an interface to multiple simulation methods including Splat, our own simulation, based on a gamma-Poisson distribution. Splat can simulate single populations of cells, populations with multiple cell types or differentiation paths.

Download Full-text

SPARTA: Simple Program for Automated reference-based bacterial RNA-seq Transcriptome Analysis

10.1101/021915 ◽

2015 ◽

Author(s):

Benjamin K Johnson ◽

Matthew B Scholz ◽

Tracy K Teal ◽

Robert B Abramovitch

Keyword(s):

Gene Expression ◽

Differential Gene Expression ◽

Quality Analysis ◽

Rna Seq ◽

Sequencing Data ◽

Data Set ◽

Bacterial Rna ◽

Analysis Workflow ◽

Differential Gene ◽

Reference Counting

Summary: SPARTA is a reference-based bacterial RNA-seq analysis workflow application for single-end Illumina reads. SPARTA is turnkey software that simplifies the process of analyzing RNA-seq data sets, making bacterial RNA-seq analysis a routine process that can be undertaken on a personal computer or in the classroom. The easy-to-install, complete workflow processes whole transcriptome shotgun sequencing data files by trimming reads and removing adapters, mapping reads to a reference, counting gene features, calculating differential gene expression, and, importantly, checking for potential batch effects within the data set. SPARTA outputs quality analysis reports, gene feature counts and differential gene expression tables and scatterplots. The workflow is implemented in Python for file management and sequential execution of each analysis step and is available for Mac OS X, Microsoft Windows, and Linux. To promote the use of SPARTA as a teaching platform, a web-based tutorial is available explaining how RNA-seq data are processed and analyzed by the software. Availability and Implementation: Tutorial and workflow can be found at sparta.readthedocs.org. Teaching materials are located at sparta-teaching.readthedocs.org. Source code can be downloaded at www.github.com/abramovitchMSU/, implemented in Python and supported on Mac OS X, Linux, and MS Windows. Contact: Robert B. Abramovitch ([email protected]) Supplemental Information: Supplementary data are available online

Download Full-text