Calculating Sample Size Estimates for RNA Sequencing Data

The performance evaluation of fusion detection algorithms from high-throughput sequencing data crucially relies on the availability of data with known positive and negative cases of gene rearrangements. The use of simulated data circumvents some shortcomings of real data by generation of an unlimited number of true and false positive events, and the consequent robust estimation of accuracy measures, such as precision and recall. Although a few simulated fusion datasets from RNA Sequencing (RNA-Seq) are available, they are of limited sample size. This makes it difficult to systematically evaluate the performance of RNA-Seq based fusion-detection algorithms. Here, we present SimFuse to address this problem. SimFuse utilizes real sequencing data as the fusions’ background to closely approximate the distribution of reads from a real sequencing library and uses a reference genome as the template from which to simulate fusions’ supporting reads. To assess the supporting read-specific performance, SimFuse generates multiple datasets with various numbers of fusion supporting reads. Compared to an extant simulated dataset, SimFuse gives users control over the supporting read features and the sample size of the simulated library, based on which the performance metrics needed for the validation and comparison of alternative fusion-detection algorithms can be rigorously estimated.

Download Full-text

Assessing Study Reproducibility through M2RI: A Novel Approach for Large-scale High-throughput Association Studies

10.1101/2020.08.18.253740 ◽

2020 ◽

Author(s):

Zeyu Jiao ◽

Yinglei Lai ◽

Jujiao Kang ◽

Weikang Gong ◽

Liang Ma ◽

...

Keyword(s):

Sample Size ◽

Rna Sequencing ◽

High Throughput ◽

Large Scale ◽

Association Studies ◽

Structural Mri ◽

Data Sets ◽

Sequencing Data ◽

Novel Approach ◽

Magnetic Resonance Imaging Mri

AbstractHigh-throughput technologies, such as magnetic resonance imaging (MRI) and DNA/RNA sequencing (DNA-seq/RNA-seq), have been increasingly used in large-scale association studies. With these technologies, important biomedical research findings have been generated. The reproducibility of these findings, especially from structural MRI (sMRI) and functional MRI (fMRI) association studies, has recently been questioned. There is an urgent demand for a reliable overall reproducibility assessment for large-scale high-throughput association studies. It is also desirable to understand the relationship between study reproducibility and sample size in an experimental design. In this study, we developed a novel approach: the mixture model reproducibility index (M2RI) for assessing study reproducibility of large-scale association studies. With M2RI, we performed study reproducibility analysis for several recent large sMRI/fMRI data sets. The advantages of our approach were clearly demonstrated, and the sample size requirements for different phenotypes were also clearly demonstrated, especially when compared to the Dice coefficient (DC). We applied M2RI to compare two MRI or RNA sequencing data sets. The reproducibility assessment results were consistent with our expectations. In summary, M2RI is a novel and useful approach for assessing study reproducibility, calculating sample sizes and evaluating the similarity between two closely related studies.

Download Full-text

HEDGEHOG/GLI Modulates the PRR11-SKA2 Bidirectional Transcription Unit in Lung Squamous Cell Carcinomas

Genes ◽

10.3390/genes12010120 ◽

2021 ◽

Vol 12 (1) ◽

pp. 120

Author(s):

Yiyun Sun ◽

Dandan Xu ◽

Chundong Zhang ◽

Yitao Wang ◽

Lian Zhang ◽

...

Keyword(s):

Rna Sequencing ◽

Squamous Cell ◽

Gene Pair ◽

Gene List ◽

Ectopic Expression ◽

Lung Squamous Cell Carcinoma ◽

Gene Set Enrichment Analysis ◽

The Cancer Genome Atlas ◽

Sequencing Data ◽

Gene Set

We previously demonstrated that proline-rich protein 11 (PRR11) and spindle and kinetochore associated 2 (SKA2) constituted a head-to-head gene pair driven by a prototypical bidirectional promoter. This gene pair synergistically promoted the development of non-small cell lung cancer. However, the signaling pathways leading to the ectopic expression of this gene pair remains obscure. In the present study, we first analyzed the lung squamous cell carcinoma (LSCC) relevant RNA sequencing data from The Cancer Genome Atlas (TCGA) database using the correlation analysis of gene expression and gene set enrichment analysis (GSEA), which revealed that the PRR11-SKA2 correlated gene list highly resembled the Hedgehog (Hh) pathway activation-related gene set. Subsequently, GLI1/2 inhibitor GANT-61 or GLI1/2-siRNA inhibited the Hh pathway of LSCC cells, concomitantly decreasing the expression levels of PRR11 and SKA2. Furthermore, the mRNA expression profile of LSCC cells treated with GANT-61 was detected using RNA sequencing, displaying 397 differentially expressed genes (203 upregulated genes and 194 downregulated genes). Out of them, one gene set, including BIRC5, NCAPG, CCNB2, and BUB1, was involved in cell division and interacted with both PRR11 and SKA2. These genes were verified as the downregulated genes via RT-PCR and their high expression significantly correlated with the shorter overall survival of LSCC patients. Taken together, our results indicate that GLI1/2 mediates the expression of the PRR11-SKA2-centric gene set that serves as an unfavorable prognostic indicator for LSCC patients, potentializing new combinatorial diagnostic and therapeutic strategies in LSCC.

Download Full-text

RNA Sequencing Data from Human Intracranial Aneurysm Tissue Reveals a Complex Inflammatory Environment Associated with Rupture

Molecular Diagnosis & Therapy ◽

10.1007/s40291-021-00552-4 ◽

2021 ◽

Author(s):

Vincent M. Tutino ◽

Haley R. Zebraski ◽

Hamidreza Rajabzadeh-Oghaz ◽

Lee Chaves ◽

Adam A. Dmytriw ◽

...

Keyword(s):

Intracranial Aneurysm ◽

Rna Sequencing ◽

Sequencing Data

Download Full-text

Mixed Distribution Models Based on Single-Cell RNA Sequencing Data

Interdisciplinary Sciences Computational Life Sciences ◽

10.1007/s12539-021-00427-6 ◽

2021 ◽

Author(s):

Min Wu ◽

Junhua Xu ◽

Tao Ding ◽

Jie Gao

Keyword(s):

Single Cell ◽

Rna Sequencing ◽

Sequencing Data ◽

Distribution Models ◽

Mixed Distribution ◽

Single Cell Rna Sequencing

Download Full-text

In-depth transcriptomic analysis of human retina reveals molecular mechanisms underlying diabetic retinopathy

Scientific Reports ◽

10.1038/s41598-021-88698-3 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Kolja Becker ◽

Holger Klein ◽

Eric Simon ◽

Coralie Viollet ◽

Christian Haslinger ◽

...

Keyword(s):

Diabetic Retinopathy ◽

Rna Sequencing ◽

Molecular Mechanisms ◽

Vision Loss ◽

Ganglion Cells ◽

Expression Profiles ◽

Cell Types ◽

Sequencing Data ◽

Disease Stages ◽

Post Transcriptional Regulation

AbstractDiabetic Retinopathy (DR) is among the major global causes for vision loss. With the rise in diabetes prevalence, an increase in DR incidence is expected. Current understanding of both the molecular etiology and pathways involved in the initiation and progression of DR is limited. Via RNA-Sequencing, we analyzed mRNA and miRNA expression profiles of 80 human post-mortem retinal samples from 43 patients diagnosed with various stages of DR. We found differentially expressed transcripts to be predominantly associated with late stage DR and pathways such as hippo and gap junction signaling. A multivariate regression model identified transcripts with progressive changes throughout disease stages, which in turn displayed significant overlap with sphingolipid and cGMP–PKG signaling. Combined analysis of miRNA and mRNA expression further uncovered disease-relevant miRNA/mRNA associations as potential mechanisms of post-transcriptional regulation. Finally, integrating human retinal single cell RNA-Sequencing data revealed a continuous loss of retinal ganglion cells, and Müller cell mediated changes in histidine and β-alanine signaling. While previously considered primarily a vascular disease, attention in DR has shifted to additional mechanisms and cell-types. Our findings offer an unprecedented and unbiased insight into molecular pathways and cell-specific changes in the development of DR, and provide potential avenues for future therapeutic intervention.

Download Full-text

COVID-19 Severity Potentially Modulated by Cardiovascular-Disease-Associated Immune Dysregulation

Viruses ◽

10.3390/v13061018 ◽

2021 ◽

Vol 13 (6) ◽

pp. 1018

Author(s):

Abby C. Lee ◽

Grant Castaneda ◽

Wei Tse Li ◽

Chengyu Chen ◽

Neil Shende ◽

...

Keyword(s):

Rna Sequencing ◽

Mononuclear Cells ◽

Human Peripheral Blood ◽

Expression Profiles ◽

Immune Dysregulation ◽

Left Ventricular ◽

Recurrent Event ◽

Healthy Controls ◽

Sequencing Data ◽

Peripheral Blood Mononuclear

Patients with underlying cardiovascular conditions are particularly vulnerable to severe COVID-19. In this project, we aimed to characterize similarities in dysregulated immune pathways between COVID-19 patients and patients with cardiomyopathy, venous thromboembolism (VTE), or coronary artery disease (CAD). We hypothesized that these similarly dysregulated pathways may be critical to how cardiovascular diseases (CVDs) exacerbate COVID-19. To evaluate immune dysregulation in different diseases, we used four separate datasets, including RNA-sequencing data from human left ventricular cardiac muscle samples of patients with dilated or ischemic cardiomyopathy and healthy controls; RNA-sequencing data of whole blood samples from patients with single or recurrent event VTE and healthy controls; RNA-sequencing data of human peripheral blood mononuclear cells (PBMCs) from patients with and without obstructive CAD; and RNA-sequencing data of platelets from COVID-19 subjects and healthy controls. We found similar immune dysregulation profiles between patients with CVDs and COVID-19 patients. Interestingly, cardiomyopathy patients display the most similar immune landscape to COVID-19 patients. Additionally, COVID-19 patients experience greater upregulation of cytokine- and inflammasome-related genes than patients with CVDs. In all, patients with CVDs have a significant overlap of cytokine- and inflammasome-related gene expression profiles with that of COVID-19 patients, possibly explaining their greater vulnerability to severe COVID-19.

Download Full-text

IMMU-27. SINGLE CELL RNA-SEQUENCING IDENTIFIES NOVEL BONE MARROW DERIVED MYELOID CELLS IN GLIOBLASTOMA ASSOCIATED WITH TUMOR AGGRESSION

Neuro-Oncology ◽

10.1093/neuonc/noaa215.457 ◽

2020 ◽

Vol 22 (Supplement_2) ◽

pp. ii110-ii110

Author(s):

Christina Jackson ◽

Christopher Cherry ◽

Sadhana Bom ◽

Hao Zhang ◽

John Choi ◽

...

Keyword(s):

Bone Marrow ◽

Single Cell ◽

Tumor Cells ◽

Rna Sequencing ◽

Metabolic Pathways ◽

Myeloid Cells ◽

Tumor Grade ◽

Sequencing Data ◽

Single Cell Rna Sequencing ◽

Two Populations

Abstract BACKGROUND Glioma associated myeloid cells (GAMs) can be induced to adopt an immunosuppressive phenotype that can lead to inhibition of anti-tumor responses in glioblastoma (GBM). Understanding the composition and phenotypes of GAMs is essential to modulating the myeloid compartment as a therapeutic adjunct to improve anti-tumor immune response. METHODS We performed single-cell RNA-sequencing (sc-RNAseq) of 435,400 myeloid and tumor cells to identify transcriptomic and phenotypic differences in GAMs across glioma grades. We further correlated the heterogeneity of the GAM landscape with tumor cell transcriptomics to investigate interactions between GAMs and tumor cells. RESULTS sc-RNAseq revealed a diverse landscape of myeloid-lineage cells in gliomas with an increase in preponderance of bone marrow derived myeloid cells (BMDMs) with increasing tumor grade. We identified two populations of BMDMs unique to GBMs; Mac-1and Mac-2. Mac-1 demonstrates upregulation of immature myeloid gene signature and altered metabolic pathways. Mac-2 is characterized by expression of scavenger receptor MARCO. Pseudotime and RNA velocity analysis revealed the ability of Mac-1 to transition and differentiate to Mac-2 and other GAM subtypes. We further found that the presence of these two populations of BMDMs are associated with the presence of tumor cells with stem cell and mesenchymal features. Bulk RNA-sequencing data demonstrates that gene signatures of these populations are associated with worse survival in GBM. CONCLUSION We used sc-RNAseq to identify a novel population of immature BMDMs that is associated with higher glioma grades. This population exhibited altered metabolic pathways and stem-like potentials to differentiate into other GAM populations including GAMs with upregulation of immunosuppressive pathways. Our results elucidate unique interactions between BMDMs and GBM tumor cells that potentially drives GBM progression and the more aggressive mesenchymal subtype. Our discovery of these novel BMDMs have implications in new therapeutic targets in improving the efficacy of immune-based therapies in GBM.

Download Full-text

Differential expression of TgMIC1 in isolates of Chinese 1 Toxoplasma with different virulence

Parasites & Vectors ◽

10.1186/s13071-021-04752-z ◽

2021 ◽

Vol 14 (1) ◽

Author(s):

Yang Wang ◽

Chengjian Han ◽

Rongsheng zhou ◽

Jinjin Zhu ◽

Famin Zhang ◽

...

Keyword(s):

Differential Expression ◽

Rna Sequencing ◽

Polyclonal Antibody ◽

Protein Level ◽

Polyclonal Antibodies ◽

Mrna Levels ◽

Quantitative Real Time Pcr ◽

Sequencing Data ◽

Predominant Genotype ◽

High Virulence

Abstract Background The predominant genotype of Toxoplasma in China is the Chinese 1 (ToxoDB#9) lineage. TgCtwh3 and TgCtwh6 are two representative strains of Chinese 1, exhibiting high and low virulence to mice, respectively. Little is known regarding the virulence mechanism of this non-classical genotype. Our previous RNA sequencing data revealed differential mRNA levels of TgMIC1 in TgCtwh3 and TgCtwh6. We aim to further confirm the differential expression of TgMIC1 and its significance in this atypical genotype. Methods Quantitative real-time PCR was used to verify the RNA sequencing data; then, polyclonal antibodies against TgMIC1 were prepared and identified. Moreover, the invasion and proliferation of the parasite in HFF cells were observed after treatment with TgMIC1 polyclonal antibody or not. Results The data showed that the protein level of TgMIC1 was significantly higher in high-virulence strain TgCtwh3 than in low-virulence strain TgCtwh6 and that the invasion and proliferation of TgCtwh3 were inhibited by TgMIC1 polyclonal antibody. Conclusion Differential expression of TgMIC1 in TgCtwh3 and TgCtwh6 may explain, at least partly, the virulence mechanism of this atypical genotype.

Download Full-text

Software Benchmark—Classification Tree Algorithms for Cell Atlases Annotation Using Single-Cell RNA-Sequencing Data

Microbiology Research ◽

10.3390/microbiolres12020022 ◽

2021 ◽

Vol 12 (2) ◽

pp. 317-334

Author(s):

Omar Alaqeeli ◽

Li Xing ◽

Xuekui Zhang

Keyword(s):

Single Cell ◽

Rna Sequencing ◽

Classification Tree ◽

Area Under The Curve ◽

Data Sets ◽

Sequencing Data ◽

Single Cell Rna Sequencing ◽

Tree Algorithms ◽

R Packages

Classification tree is a widely used machine learning method. It has multiple implementations as R packages; rpart, ctree, evtree, tree and C5.0. The details of these implementations are not the same, and hence their performances differ from one application to another. We are interested in their performance in the classification of cells using the single-cell RNA-Sequencing data. In this paper, we conducted a benchmark study using 22 Single-Cell RNA-sequencing data sets. Using cross-validation, we compare packages’ prediction performances based on their Precision, Recall, F1-score, Area Under the Curve (AUC). We also compared the Complexity and Run-time of these R packages. Our study shows that rpart and evtree have the best Precision; evtree is the best in Recall, F1-score and AUC; C5.0 prefers more complex trees; tree is consistently much faster than others, although its complexity is often higher than others.

Download Full-text