Deciphering Novel SARS CoV-2 specific Disease Pathways from RNA Sequencing Data of COVID-19 Infected A549 Cells and Potential Therapeutics using Next Generation Knowledge Discovery Platforms

Abstract BackgroundThe coronavirus (CoV) disease identified in Wuhan, China in 2019 (COVID-19) was chiefly characterized by atypical pneumonia and severe acute respiratory syndrome (SARS) and caused by SARS CoV-2 that belongs to the family Coronaviridae. COVID-19 symptoms vary from a mild cold to more severe illnesses such as SARS, thrombosis, stroke, organ failure, and in some patients even cause mortality. Deciphering the underlying disease mechanisms is pivotal for the identification and development of COVID-19 specific drugs for effective treatment and prevent human-to-human transmission, disease complications, and mortality. Methodology: Here, the Next Generation RNA Sequencing (RNA Seq) data using Illumina Next Seq 500 from SARS CoV-infected A549 cells and mock-treated A549 cells, were obtained from the gene expression omnibus (GEO) (GSE147507) and the Quality Control (QC) were evaluated using the CLC Genomics Workbench 20.0 (Qiagen, USA) before the RNA Seq analysis. The DEGs were imported into BioJupies to analyze to decipher COVID-19 induced biological, molecular, and cellular processes, pathways, and small molecules derived from chemical synthesis or natural sources to mimic or reverse COVID-19-specific gene signatures. Besides, we have used the iPathwayGuide (Advaita Bioinformatics USA) to identify COVID-19 specific pathways, biological, molecular, and cellular processes, and “druggable” candidates for future therapy. Results: 141 DEGs were identified out of a total of 9665 DEGs obtained from BioJupies analysis of the RNASeq reads of the SARS CoV infected A549 cells and mock-treated A549 cells based on a p-value cut off (0.05) and a fold change cut off 1.5.Conclusion: In conclusion, the present study unravels a novel approach of using next-generation knowledge discovery platforms to discover specific drugs for the amelioration of COVID-19 related disease pathologies.

Download Full-text

OUTRIDER: A statistical method for detecting aberrantly expressed genes in RNA sequencing data

10.1101/322149 ◽

2018 ◽

Cited By ~ 2

Author(s):

Felix Brechtmann ◽

Agnė Matusevičiūtė ◽

Christian Mertes ◽

Vicente A Yépez ◽

Žiga Avsec ◽

...

Keyword(s):

Gene Expression ◽

Rna Sequencing ◽

Negative Binomial ◽

Statistical Significance ◽

P Value ◽

Rna Seq ◽

Sequencing Data ◽

Data Set ◽

Aberrant Gene Expression ◽

Aberrant Gene

AbstractRNA sequencing (RNA-seq) is gaining popularity as a complementary assay to genome sequencing for precisely identifying the molecular causes of rare disorders. A powerful approach is to identify aberrant gene expression levels as potential pathogenic events. However, existing methods for detecting aberrant read counts in RNA-seq data either lack assessments of statistical significance, so that establishing cutoffs is arbitrary, or rely on subjective manual corrections for confounders. Here, we describe OUTRIDER (OUTlier in RNA-seq fInDER), an algorithm developed to address these issues. The algorithm uses an autoencoder to model read count expectations according to the co-variation among genes resulting from technical, environmental, or common genetic variations. Given these expectations, the RNA-seq read counts are assumed to follow a negative binomial distribution with a gene-specific dispersion. Outliers are then identified as read counts that significantly deviate from this distribution. The model is automatically fitted to achieve the best correction of artificially corrupted data. Precision–recall analyses using simulated outlier read counts demonstrated the importance of combining correction for co-variation and significance-based thresholds. OUTRIDER is open source and includes functions for filtering out genes not expressed in a data set, for identifying outlier samples with too many aberrantly expressed genes, and for the P-value-based detection of aberrant gene expression, with false discovery rate adjustment. Overall, OUTRIDER provides a computationally fast and scalable end-to-end solution for identifying aberrantly expressed genes, suitable for use by rare disease diagnostic platforms.

Download Full-text

Genotype-free individual genome reconstruction of Multiparental Population Models by RNA sequencing data

10.1101/2020.10.11.335323 ◽

2020 ◽

Author(s):

Kwangbom Choi ◽

Hao He ◽

Daniel M. Gatti ◽

Vivek M. Philip ◽

Narayanan Raghupathy ◽

...

Keyword(s):

Gene Expression ◽

Rna Sequencing ◽

Gene Expression Regulation ◽

Agricultural Research ◽

Model Systems ◽

Specific Gene ◽

Rna Seq ◽

Sequencing Data ◽

Individual Genome ◽

Genome Reconstruction

AbstractMulti-parent populations (MPPs), genetically segregating model systems derived from two or more inbred founder strains, are widely used in biomedical and agricultural research. Gene expression profiling by direct RNA sequencing (RNA-Seq) is commonly applied to MPPs to investigate gene expression regulation and to identify candidate genes. In genetically diverse populations, including most MPPs, quantification of gene expression is improved when the RNA-Seq reads are aligned to individualized transcriptomes that incorporate known polymorphic loci. However, the process of constructing and analyzing individual genomes can be computationally demanding and error prone. We propose a new approach, genome reconstruction by RNA-Seq (GBRS), that relies on simultaneous alignment of RNA-Seq reads to the founder strain transcriptomes. GBRS can reconstruct the diploid genome of each individual and quantify both total and allele-specific gene expression. We demonstrate that GBRS performs as well as methods that rely on high-density genotyping arrays to reconstruct the founder haplotype mosaic of MPP individuals. Using GBRS in addition to other genotyping methods provides quality control for detecting sample mix-ups and improves power to detect expression quantitative trait loci. GBRS software is freely available at https://github.com/churchill-lab/gbrs.

Download Full-text

FC 011KIDNEYNETWORK: USING KIDNEY DERIVED GENE EXPRESSION DATA TO PREDICT AND PRIORITIZE NOVEL GENES INVOLVED IN KIDNEY DISEASE

Nephrology Dialysis Transplantation ◽

10.1093/ndt/gfab131.001 ◽

2021 ◽

Vol 36 (Supplement_1) ◽

Author(s):

Floranne Boulogne ◽

Laura Claus ◽

Henry Wiersma ◽

Roy Oelen ◽

Floor Schukking ◽

...

Keyword(s):

Gene Expression ◽

Kidney Disease ◽

Candidate Gene ◽

Exome Sequencing ◽

Rna Sequencing ◽

Expression Patterns ◽

Genetic Diagnosis ◽

Specific Gene ◽

Sequencing Data ◽

Exome Sequencing Data

Abstract Background and Aims Genetic testing in patients with suspected hereditary kidney disease does not always reveal the genetic cause for the patient's disorder. Potentially pathogenic variants can reside in genes that are not known to be involved in kidney disease, which makes it difficult to prioritize and interpret the relevance of these variants. As such, there is a clear need for methods that predict the phenotypic consequences of gene expression in a way that is as unbiased as possible. To help identify candidate genes we have developed KidneyNetwork, in which tissue-specific expression is utilized to predict kidney-specific gene functions. Method We combined gene co-expression in 878 publicly available kidney RNA-sequencing samples with the co-expression of a multi-tissue RNA-sequencing dataset of 31,499 samples to build KidneyNetwork. The expression patterns were used to predict which genes have a kidney-related function, and which (disease) phenotypes might be caused when these genes are mutated. By integrating the information from the HPO database, in which known phenotypic consequences of disease genes are annotated, with the gene co-expression network we obtained prediction scores for each gene per HPO term. As proof of principle, we applied KidneyNetwork to prioritize variants in exome-sequencing data from 13 kidney disease patients without a genetic diagnosis. Results We assessed the prediction performance of KidneyNetwork by comparing it to GeneNetwork, a multi-tissue co-expression network we previously developed. In KidneyNetwork, we observe a significantly improved prediction accuracy of kidney-related HPO-terms, as well as an increase in the total number of significantly predicted kidney-related HPO-terms (figure 1). To examine its clinical utility, we applied KidneyNetwork to 13 patients with a suspected hereditary kidney disease without a genetic diagnosis. Based on the HPO terms “Renal cyst” and “Hepatic cysts”, combined with a list of potentially damaging variants in one of the undiagnosed patients with mild ADPKD/PCLD, we identified ALG6 as a new candidate gene. ALG6 bears a high resemblance to other genes implicated in this phenotype in recent years. Through the 100,000 Genomes Project and collaborators we identified three additional patients with kidney and/or liver cysts carrying a suspected deleterious variant in ALG6. Conclusion We present KidneyNetwork, a kidney specific co-expression network that accurately predicts what genes have kidney-specific functions and may result in kidney disease. Gene-phenotype associations of genes unknown for kidney-related phenotypes can be predicted by KidneyNetwork. We show the added value of KidneyNetwork by applying it to exome sequencing data of kidney disease patients without a molecular diagnosis and consequently we propose ALG6 as a promising candidate gene. KidneyNetwork can be applied to clinically unsolved kidney disease cases, but it can also be used by researchers to gain insight into individual genes to better understand kidney physiology and pathophysiology. Acknowledgments This research was made possible through access to the data and findings generated by the 100,000 Genomes Project; http://www.genomicsengland.co.uk.

Download Full-text

Importance of experimental information (metadata) for archived sequence data: case of specific gene bias due to lag time between sample harvest and RNA protection in RNA sequencing

PeerJ ◽

10.7717/peerj.11875 ◽

2021 ◽

Vol 9 ◽

pp. e11875

Author(s):

Tomoko Matsuda

Keyword(s):

Gene Expression ◽

Rna Sequencing ◽

Time Course ◽

Sequence Data ◽

Specific Gene ◽

Time Interval ◽

Short Time Interval ◽

Rna Seq ◽

Lysis Buffer ◽

Rna Protection

Large volumes of high-throughput sequencing data have been submitted to the Sequencing Read Archive (SRA). The lack of experimental metadata associated with the data makes reuse and understanding data quality very difficult. In the case of RNA sequencing (RNA-Seq), which reveals the presence and quantity of RNA in a biological sample at any moment, it is necessary to consider that gene expression responds over a short time interval (several seconds to a few minutes) in many organisms. Therefore, to isolate RNA that accurately reflects the transcriptome at the point of harvest, raw biological samples should be processed by freezing in liquid nitrogen, immersing in RNA stabilization reagent or lysing and homogenizing in RNA lysis buffer containing guanidine thiocyanate as soon as possible. As the number of samples handled simultaneously increases, the time until the RNA is protected can increase. Here, to evaluate the effect of different lag times in RNA protection on RNA-Seq data, we harvested CHO-S cells after 3, 5, 6, and 7 days of cultivation, added RNA lysis buffer in a time course of 15, 30, 45, and 60 min after harvest, and conducted RNA-Seq. These RNA samples showed high RNA integrity number (RIN) values indicating non-degraded RNA, and sequence data from libraries prepared with these RNA samples was of high quality according to FastQC. We observed that, at the same cultivation day, global trends of gene expression were similar across the time course of addition of RNA lysis buffer; however, the expression of some genes was significantly different between the time-course samples of the same cultivation day; most of these differentially expressed genes were related to apoptosis. We conclude that the time lag between sample harvest and RNA protection influences gene expression of specific genes. It is, therefore, necessary to know not only RIN values of RNA and the quality of the sequence data but also how the experiment was performed when acquiring RNA-Seq data from the database.

Download Full-text

rCASC: reproducible classification analysis of single-cell sequencing data

GigaScience ◽

10.1093/gigascience/giz105 ◽

2019 ◽

Vol 8 (9) ◽

Cited By ~ 5

Author(s):

Luca Alessandrì ◽

Francesca Cordero ◽

Marco Beccuti ◽

Maddalena Arigoni ◽

Martina Olivero ◽

...

Keyword(s):

Single Cell ◽

Rna Sequencing ◽

Cellular Heterogeneity ◽

Cell Subpopulation ◽

Integrated Analysis ◽

Specific Gene ◽

Sequencing Data ◽

Single Cell Sequencing ◽

Cell Stability ◽

Reproducible Analysis

Abstract Background Single-cell RNA sequencing is essential for investigating cellular heterogeneity and highlighting cell subpopulation-specific signatures. Single-cell sequencing applications have spread from conventional RNA sequencing to epigenomics, e.g., ATAC-seq. Many related algorithms and tools have been developed, but few computational workflows provide analysis flexibility while also achieving functional (i.e., information about the data and the tools used are saved as metadata) and computational reproducibility (i.e., a real image of the computational environment used to generate the data is stored) through a user-friendly environment. Findings rCASC is a modular workflow providing an integrated analysis environment (from count generation to cell subpopulation identification) exploiting Docker containerization to achieve both functional and computational reproducibility in data analysis. Hence, rCASC provides preprocessing tools to remove low-quality cells and/or specific bias, e.g., cell cycle. Subpopulation discovery can instead be achieved using different clustering techniques based on different distance metrics. Cluster quality is then estimated through the new metric "cell stability score" (CSS), which describes the stability of a cell in a cluster as a consequence of a perturbation induced by removing a random set of cells from the cell population. CSS provides better cluster robustness information than the silhouette metric. Moreover, rCASC's tools can identify cluster-specific gene signatures. Conclusions rCASC is a modular workflow with new features that could help researchers define cell subpopulations and detect subpopulation-specific markers. It uses Docker for ease of installation and to achieve a computation-reproducible analysis. A Java GUI is provided to welcome users without computational skills in R.

Download Full-text

A Two-Stage Poisson Model for Testing RNA-Seq Data

Statistical Applications in Genetics and Molecular Biology ◽

10.2202/1544-6115.1627 ◽

2011 ◽

Vol 10 (1) ◽

Cited By ~ 39

Author(s):

Paul L. Auer ◽

Rebecca W Doerge

Keyword(s):

Rna Sequencing ◽

Statistical Approach ◽

Poisson Model ◽

Real Data ◽

Rna Seq ◽

Sequencing Data ◽

Sequencing Technology ◽

Two Stage ◽

Individual Gene ◽

Unique Nature

RNA sequencing technology is providing data of unprecedented throughput, resolution, and accuracy. Although there are many different computational tools for processing these data, there are a limited number of statistical methods for analyzing them, and even fewer that acknowledge the unique nature of individual gene transcription. We introduce a simple and powerful statistical approach, based on a two-stage Poisson model, for modeling RNA sequencing data and testing for biologically important changes in gene expression. The advantages of this approach are demonstrated through simulations and real data applications.

Download Full-text

SimFuse: A Novel Fusion Simulator for RNA Sequencing (RNA-Seq) Data

BioMed Research International ◽

10.1155/2015/780519 ◽

2015 ◽

Vol 2015 ◽

pp. 1-5 ◽

Cited By ~ 2

Author(s):

Yuxiang Tan ◽

Yann Tambouret ◽

Stefano Monti

Keyword(s):

Sample Size ◽

Rna Sequencing ◽

High Throughput Sequencing ◽

Performance Metrics ◽

Simulated Data ◽

Real Data ◽

Rna Seq ◽

Sequencing Data ◽

Detection Algorithms ◽

Fusion Detection

The performance evaluation of fusion detection algorithms from high-throughput sequencing data crucially relies on the availability of data with known positive and negative cases of gene rearrangements. The use of simulated data circumvents some shortcomings of real data by generation of an unlimited number of true and false positive events, and the consequent robust estimation of accuracy measures, such as precision and recall. Although a few simulated fusion datasets from RNA Sequencing (RNA-Seq) are available, they are of limited sample size. This makes it difficult to systematically evaluate the performance of RNA-Seq based fusion-detection algorithms. Here, we present SimFuse to address this problem. SimFuse utilizes real sequencing data as the fusions’ background to closely approximate the distribution of reads from a real sequencing library and uses a reference genome as the template from which to simulate fusions’ supporting reads. To assess the supporting read-specific performance, SimFuse generates multiple datasets with various numbers of fusion supporting reads. Compared to an extant simulated dataset, SimFuse gives users control over the supporting read features and the sample size of the simulated library, based on which the performance metrics needed for the validation and comparison of alternative fusion-detection algorithms can be rigorously estimated.

Download Full-text

Methods for analyzing next-generation sequencing data III. From setting a Linux environment to manipulating Lactobacillus RNA-seq data

Japanese Journal of Lactic Acid Bacteria ◽

10.4109/jslab.26.32 ◽

2015 ◽

Vol 26 (1) ◽

pp. 32-41

Author(s):

Jianqiang Sun ◽

Aya Miura ◽

Kentaro Shimizu ◽

Koji Kadota

Keyword(s):

Next Generation Sequencing ◽

Next Generation Sequencing Data ◽

Rna Seq ◽

Next Generation ◽

Sequencing Data ◽

Generation Sequencing

Download Full-text

SSCC: a novel computational framework for rapid and accurate clustering large single cell RNA-seq data

10.1101/344242 ◽

2018 ◽

Cited By ~ 2

Author(s):

Xianwen Ren ◽

Liangtao Zheng ◽

Zemin Zhang

Keyword(s):

Single Cell ◽

Rna Sequencing ◽

Large Scale ◽

Random Projection ◽

Rna Seq ◽

Sequencing Data ◽

Computational Framework ◽

Human Blood Cells ◽

Single Cell Rna Sequencing ◽

Data Volume

ABSTRACTClustering is a prevalent analytical means to analyze single cell RNA sequencing data but the rapidly expanding data volume can make this process computational challenging. New methods for both accurate and efficient clustering are of pressing needs. Here we proposed a new clustering framework based on random projection and feature construction for large scale single-cell RNA sequencing data, which greatly improves clustering accuracy, robustness and computational efficacy for various state-of-the-art algorithms benchmarked on multiple real datasets. On a dataset with 68,578 human blood cells, our method reached 20% improvements for clustering accuracy and 50-fold acceleration but only consumed 66% memory usage compared to the widely-used software package SC3. Compared to k-means, the accuracy improvement can reach 3-fold depending on the concrete dataset. An R implementation of the framework is available from https://github.com/Japrin/sscClust.

Download Full-text

P027 Epithelial cells of patients with ulcerative colitis do not show an increased sensitivity after microbiota stimulation compared to non-IBD controls

Journal of Crohn s and Colitis ◽

10.1093/ecco-jcc/jjab076.156 ◽

2021 ◽

Vol 15 (Supplement_1) ◽

pp. S142-S143

Author(s):

K Arnauts ◽

C Lapierre ◽

B Verstockt ◽

S Verstockt ◽

P Sudhakar ◽

...

Keyword(s):

Epithelial Cell ◽

Epithelial Cells ◽

Cell Count ◽

Rna Sequencing ◽

Principal Component ◽

Cell Permeability ◽

Patient Specific ◽

P Value ◽

Sequencing Data ◽

Different Response

Abstract Background Alterations in the intestinal microbiota play a pivotal role in the pathogenesis of Inflammatory Bowel Diseases (IBD). Although there is a lot of interest in restoring dysbiosis, the effects of microbial alterations are not fully understood. In addition, it is known that epithelial cells from IBD patients maintain intrinsic defects1. For that reason, our aim was to unravel if epithelial cells of UC patients are more sensitive towards microbiota stimulation, compared to non-IBD controls. Methods Intestinal organoids of UC patients (n=8) and non-IBD controls (n=8) were grown as monolayers on Transwell inserts. Upon confluency (evaluated by transepithelial electrical resistance (TEER)), monolayers were stimulated for 24 hours with TNF-α (100 ng/ml), IL-1β (20 ng/ml) and Flagellin (1 µg/ml) to mimic inflammation. Fresh fecal samples of a selected donor (n=1, high microbial cell count and presence of selected phyla2) and UC patients (n=3, endoscopic sub-mayo ≥2) were filtered and stored in 0.9% NaCl. Monolayers were stimulated for 6 hours with 3.108 microbial cells (cell count by Flow Cytometry). RNA sequencing was performed by Truseq for Illumina. Differentially expressed genes (DEG) were studied by DESeq2 (FDR <0.05). Results Although TEER measurements indicated a higher epithelial cell permeability upon UC microbiota stimulation in UC patients compared to non-IBD controls (p=0.038; Mann-Whitney; Figure 1), we could not confirm this distinct response based on RNA sequencing data at principal component analysis (PCA). Several epithelial barrier genes were significantly upregulated between UC and non-IBD epithelium at nominal p-value, while only CLDN1 and 18 were significant for FDR <0.05 (Figure 2). Clustering on PCA was driven by microbial treatment and not by epithelial origin (Figure 3). Inflamed monolayers of UC patients showed different baseline characteristics (129 DEG; e.g. HLA-G, MUC2, CLDN1, IL23A, PARP8; Figure 4A), but did not propagate in a different response upon microbiota exposure compared to non-IBD controls. Treatment with microbiota of UC patients (23 DEG; e.g. PARP9, TGFBI, ANXA13) or the selected donor (58 DEG; e.g. CCL5, CLDN18, TGFBI) only induced minor differences between epithelial cell types (Figure 4B). Conclusion We observed no different response in epithelial cells of UC patients towards microbiota stimulation compared to non-IBD epithelial cells on transcriptomic level. Further validation on barrier integrity is needed. We observed no indications that microbial treatment would be less beneficial to UC patients, based on the epithelial cell response. Addition of (patient specific) immune cells will contribute to unraveling host-microbiota interactions in IBD patients. References

Download Full-text