scholarly journals Enabling cross-study analysis of RNA-Sequencing data

2017 ◽  
Author(s):  
Qingguo Wang ◽  
Joshua Armenia ◽  
Chao Zhang ◽  
Alexander V. Penson ◽  
Ed Reznik ◽  
...  

AbstractDriven by the recent advances of next generation sequencing (NGS) technologies and an urgent need to decode complex human diseases, a multitude of large-scale studies were conducted recently that have resulted in an unprecedented volume of whole transcriptome sequencing (RNA-seq) data. While these data offer new opportunities to identify the mechanisms underlying disease, the comparison of data from different sources poses a great challenge, due to differences in sample and data processing. Here, we present a pipeline that processes and unifies RNA-seq data from different studies, which includes uniform realignment and gene expression quantification as well as batch effect removal. We find that uniform alignment and quantification is not sufficient when combining RNA-seq data from different sources and that the removal of other batch effects is essential to facilitate data comparison. We have processed data from the Genotype Tissue Expression project (GTEx) and The Cancer Genome Atlas (TCGA) and have successfully corrected for study-specific biases, enabling comparative analysis across studies. The normalized data are available for download via GitHub (at https://github.com/mskcc/RNAseqDB).

2018 ◽  
Author(s):  
Eric Olivier Audemard ◽  
Patrick Gendron ◽  
Vincent-Philippe Lavallée ◽  
Josée Hébert ◽  
Guy Sauvageau ◽  
...  

AbstractMutations identified in each Acute Myeloid Leukemia (AML) patients are useful for prognosis and to select targeted therapies. Detection of such mutations by the analysis of Next-Generation Sequencing (NGS) data requires a computationally intensive read mapping step and application of several variant calling methods. Targeted mutation identification drastically shifts the usual tradeoff between accuracy and performance by concentrating all computations over a small portion of sequence space. Here, we present km, an efficient approach leveraging k-mer decomposition of reads to identify targeted mutations. Our approach is versatile, as it can detect single-base mutations, several types of insertions and deletions, as well as fusions. We used two independent AML cohorts (The Cancer Genome Atlas and Leucegene), to show that mutation detection bykmis fast, accurate and mainly limited by sequencing depth. Therefore,kmallows to establish fast diagnostics from NGS data, and could be suitable for clinical applications.


Open Medicine ◽  
2021 ◽  
Vol 16 (1) ◽  
pp. 459-463
Author(s):  
Arash Hooshmand

Abstract A new logistic regression-based method to distinguish between cancerous and noncancerous RNA genomic data is developed and tested with 100% precision on 595 healthy and cancerous prostate samples. A logistic regression system is developed and trained using whole-exome sequencing data at a high-level, i.e., normalized quantification of RNAs obtained from 495 prostate cancer samples from The Cancer Genome Atlas and 100 healthy samples from the Genotype-Tissue Expression project. We could show that both sensitivity and specificity of the method in the classification of cancerous and noncancerous cells are perfectly 100%.


2017 ◽  
Author(s):  
Bo-Hyun You ◽  
Sang-Ho Yoon ◽  
Jin-Wu Nam

AbstractThe advent of high-throughput RNA-sequencing (RNA-seq) has led to the discovery of unprecedentedly immense transcriptomes encoded by eukaryotic genomes. However, the transcriptome maps are still incomplete partly because they were mostly reconstructed based on RNA-seq reads that lack their orientations (known as unstranded reads) and certain boundary information. Methods to expand the usability of unstranded RNA-seq data by predetermining the orientation of the reads and precisely determining the boundaries of assembled transcripts could significantly benefit the quality of the resulting transcriptome maps. Here, we present a high-performing transcriptome assembly pipeline, called CAFE, that significantly improves the original assemblies, respectively assembled with stranded and/or unstranded RNA-seq data, by orienting unstranded reads using the maximum likelihood estimation and by integrating information about transcription start sites and cleavage and polyadenylation sites. Applying large-scale transcriptomic data comprising ninety-nine billion RNAs-seq reads from the ENCODE, human BodyMap projects, The Cancer Genome Atlas, and GTEx, CAFE enabled us to predict the directions of about eighty-nine billion unstranded reads, which led to the construction of more accurate transcriptome maps, comparable to the manually curated map, and a comprehensive lncRNA catalogue that includes thousands of novel lncRNAs. Our pipeline should not only help to build comprehensive, precise transcriptome maps from complex genomes but also to expand the universe of non-coding genomes.


2019 ◽  
Author(s):  
Swati Venkat ◽  
Arwen A. Tisdale ◽  
Johann R. Schwarz ◽  
Abdulrahman A. Alahmari ◽  
H. Carlo Maurer ◽  
...  

ABSTRACTAlternative polyadenylation (APA) is a gene regulatory process that dictates mRNA 3’-UTR length, resulting in changes in mRNA stability and localization. APA is frequently disrupted in cancer and promotes tumorigenesis through altered expression of oncogenes and tumor suppressors. Pan-cancer analyses have revealed common APA events across the tumor landscape; however, little is known about tumor type-specific alterations that may uncover novel events and vulnerabilities. Here we integrate RNA-sequencing data from the Genotype-Tissue Expression (GTEx) project and The Cancer Genome Atlas (TCGA) to comprehensively analyze APA events in 148 pancreatic ductal adenocarcinomas (PDAs). We report widespread, recurrent and functionally relevant 3’-UTR alterations associated with gene expression changes of known and newly identified PDA growth-promoting genes and experimentally validate the effects of these APA events on expression. We find enrichment for APA events in genes associated with known PDA pathways, loss of tumor-suppressive miRNA binding sites, and increased heterogeneity in 3’-UTR forms of metabolic genes. Survival analyses reveal a subset of 3’-UTR alterations that independently characterize a poor prognostic cohort among PDA patients. Finally, we identify and validate the casein kinase CK1α as an APA-regulated therapeutic target in PDA. Knockdown or pharmacological inhibition of CK1α attenuates PDA cell proliferation and clonogenic growth. Our single-cancer analysis reveals APA as an underappreciated driver of pro-tumorigenic gene expression in PDA via the loss of miRNA regulation.


Cancers ◽  
2020 ◽  
Vol 12 (9) ◽  
pp. 2672
Author(s):  
Jaideep Chakladar ◽  
Selena Z. Kuo ◽  
Grant Castaneda ◽  
Wei Tse Li ◽  
Aditi Gnanasekar ◽  
...  

An intra-pancreatic microbiota was recently discovered in several prominent studies. Since pancreatic adenocarcinoma (PAAD) is one of the most lethal cancers worldwide, and the intratumor microbiome was found to be a significant contributor to carcinogenesis in other cancers, this study aims to characterize the PAAD microbiome and elucidate how it may be associated with PAAD prognosis. We further explored the association between the intra-pancreatic microbiome and smoking and gender, which are both risk factors for PAAD. RNA-sequencing data from The Cancer Genome Atlas (TCGA) were used to infer microbial abundance, which was correlated to clinical variables and to cancer and immune-associated gene expression, to determine how microbes may contribute to cancer progression. We discovered that the presence of several bacteria species within PAAD tumors is linked to metastasis and immune suppression. This is the first large-scale study to report microbiome-immune correlations in human pancreatic cancer samples. Furthermore, we found that the increased prevalence and poorer prognosis of PAAD in males and smokers are linked to the presence of potentially cancer-promoting or immune-inhibiting microbes. Further study into the roles of these microbes in PAAD is imperative for understanding how a pro-tumor microenvironment may be treated to limit cancer progression.


2021 ◽  
Author(s):  
Ram Ayyala ◽  
Junghyun Jung ◽  
Sergey Knyazev ◽  
SERGHEI MANGUL

Although precise identification of the human leukocyte antigen (HLA) allele is crucial for various clinical and research applications, HLA typing remains challenging due to high polymorphism of the HLA loci. However, with Next-Generation Sequencing (NGS) data becoming widely accessible, many computational tools have been developed to predict HLA types from RNA sequencing (RNA-seq) data. However, there is a lack of comprehensive and systematic benchmarking of RNA-seq HLA callers using large-scale and realist gold standards. In order to address this limitation, we rigorously compared the performance of 12 HLA callers over 50,000 HLA tasks including searching 30 pairwise combinations of HLA callers and reference in over 1,500 samples. In each case, we produced evaluation metrics of accuracy that is the percentage of correctly predicted alleles (two and four-digit resolution) based on six gold standard datasets spanning 650 RNA-seq samples. To determine the influence of the relationship of the read length over the HLA region on prediction quality using each tool, we explored the read length effect by considering read length in the range 37-126 bp, which was available in our gold standard datasets. Moreover, using the Genotype-Tissue Expression (GTEx) v8 data, we carried out evaluation metrics by calculating the concordance of the same HLA type across different tissues from the same individual to evaluate how well the HLA callers can maintain consistent results across various tissues of the same individual. This study offers crucial information for researchers regarding appropriate choices of methods for an HLA analysis.


2021 ◽  
Author(s):  
Ioannis Kavakiotis ◽  
Athanasios Alexiou ◽  
Spyros Tastsoglou ◽  
Ioannis S Vlachos ◽  
Artemis G Hatzigeorgiou

Abstract microRNAs (miRNAs) are short (∼23nt) single-stranded non-coding RNAs that act as potent post-transcriptional gene expression regulators. Information about miRNA expression and distribution across cell types and tissues is crucial to the understanding of their function and for their translational use as biomarkers or therapeutic targets. DIANA-miTED is the most comprehensive and systematic collection of miRNA expression values derived from the analysis of 15 183 raw human small RNA-Seq (sRNA-Seq) datasets from the Sequence Read Archive (SRA) and The Cancer Genome Atlas (TCGA). Metadata quality maximizes the utility of expression atlases, therefore we manually curated SRA and TCGA-derived information to deliver a comprehensive and standardized set, incorporating in total 199 tissues, 82 anatomical sublocations, 267 cell lines and 261 diseases. miTED offers rich instant visualizations of the expression and sample distributions of requested data across variables, as well as study-wide diagrams and graphs enabling efficient content exploration. Queries also generate links towards state-of-the-art miRNA functional resources, deeming miTED an ideal starting point for expression retrieval, exploration, comparison, and downstream analysis, without requiring bioinformatics support or expertise. DIANA-miTED is freely available at http://www.microrna.gr/mited.


Immunotherapy ◽  
2021 ◽  
Author(s):  
Ying Ni ◽  
Ahmed Soliman ◽  
Amy Joehlin-Price ◽  
Fadi Abdul-Karim ◽  
Peter G Rose ◽  
...  

Aims: We investigated immunogenomic signatures and correlated them with survival in ovarian cancer (OV) and endometrial cancer (EC). Materials & Method: We used whole transcriptome sequencing data from uterine serous cancer and The Cancer Genome Atlas data of OV and EC (n = 719). Gene expression score was calculated. Population abundance of immune cells were estimated. Results: TGF-β, myeloid cells, IFN-γ, T cells, B cells and endothelial cells predicted overall survival. Whereas CD47, neutrophils and endothelial cells predicted progression-free survival. In multivariate analyses, TGF-β, CD47 and monocytic cells predicted survival in high levels of microsatellite instability (MSI-H) EC whereas high IFN-γ trended toward improved survival in the MSI-S EC. High IFN-γ/low TGF-β and high IFN-γ/low CD47 signatures predicted longer overall survival. Low TGF-β/low CD47 signature predicted longer overall survival only in the MSI-H EC. Conclusion: Our data support the role of immune markers in predicting survival in OV/EC.


Cancers ◽  
2019 ◽  
Vol 11 (9) ◽  
pp. 1273
Author(s):  
Wei Tse Li ◽  
Angela E. Zou ◽  
Christine O. Honda ◽  
Hao Zheng ◽  
Xiao Qi Wang ◽  
...  

Immunotherapy has emerged in recent years as arguably the most effective treatment for advanced hepatocellular carcinoma (HCC), but the failure of a large percentage of patients to respond to immunotherapy remains as the ultimate obstacle to successful treatment. Etiology-associated dysregulation of immune-associated (IA) genes may be central to the development of this differential clinical response. We identified immune-associated genes potentially dysregulated by alcohol or viral hepatitis B in HCC and validated alcohol-induced dysregulations in vitro while using large-scale RNA-sequencing data from The Cancer Genome Atlas (TCGA). Thirty-four clinically relevant dysregulated IA genes were identified. We profiled the correlation of all genomic alterations in HCC patients to IA gene expression while using the information theory-based algorithm REVEALER to investigate the molecular mechanism for their dysregulation and explore the possibility of genome-based patient stratification. We also studied gene expression regulators and identified multiple microRNAs that were implicated in HCC pathogenesis that can potentially regulate these IA genes’ expression. Our study identified potential key pathways, including the IL-7 signaling pathway and TNFRSF4 (OX40)- NF-κB pathway, to target in immunotherapy treatments and presents microRNAs as promising therapeutic targets for dysregulated IA genes because of their extensive regulatory roles in the cancer immune landscape.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Suleyman Vural ◽  
Lun-Ching Chang ◽  
Laura M. Yee ◽  
Dmitriy Sonkin

AbstractTP53 is one of the most frequently altered genes in cancer; it can be inactivated by a number of different mechanisms. NM_000546.6 (ENST00000269305.9) is by far the predominant TP53 isoform, however a few other alternative isoforms have been described to be expressed at much lower levels. To better understand patterns of TP53 alternative isoforms expression in cancer and normal samples we performed exon-exon junction reads based analysis of TP53 isoforms using RNA-seq data from The Cancer Genome Atlas (TCGA), Cancer Cell Line Encyclopedia (CCLE), and Genotype-Tissue Expression (GTEx) project. TP53 C-terminal alternative isoforms have abolished or severely decreased tumor suppressor activity, and therefore, an increase in fraction of TP53 C-terminal alternative isoforms may be expected in tumors with wild type TP53. Despite our expectation that there would be increase of fraction of TP53 C-terminal alternative isoforms, we observed no substantial increase in fraction of TP53 C-terminal alternative isoforms in TCGA tumors and CCLE cancer cell lines with wild type TP53, likely indicating that TP53 C-terminal alternative isoforms expression cannot be reliably selected for during tumor progression.


Sign in / Sign up

Export Citation Format

Share Document