Comprehensive Analysis of Large-Scale Transcriptomes from Multiple Cancer Types

Various abnormalities of transcriptional regulation revealed by RNA sequencing (RNA-seq) have been reported in cancers. However, strategies to integrate multi-modal information from RNA-seq, which would help uncover more disease mechanisms, are still limited. Here, we present PipeOne, a cross-platform one-stop analysis workflow for large-scale transcriptome data. It was developed based on Nextflow, a reproducible workflow management system. PipeOne is composed of three modules, data processing and feature matrices construction, disease feature prioritization, and disease subtyping. It first integrates eight different tools to extract different information from RNA-seq data, and then used random forest algorithm to study and stratify patients according to evidences from multiple-modal information. Its application in five cancers (colon, liver, kidney, stomach, or thyroid; total samples n = 2024) identified various dysregulated key features (such as PVT1 expression and ABI3BP alternative splicing) and pathways (especially liver and kidney dysfunction) shared by multiple cancers. Furthermore, we demonstrated clinically-relevant patient subtypes in four of five cancers, with most subtypes characterized by distinct driver somatic mutations, such as TP53, TTN, BRAF, HRAS, MET, KMT2D, and KMT2C mutations. Importantly, these subtyping results were frequently contributed by dysregulated biological processes, such as ribosome biogenesis, RNA binding, and mitochondria functions. PipeOne is efficient and accurate in studying different cancer types to reveal the specificity and cross-cancer contributing factors of each cancer.It could be easily applied to other diseases and is available at GitHub.

Download Full-text

Gene Expression Imputation with Generative Adversarial Imputation Nets

10.1101/2020.06.09.141689 ◽

2020 ◽

Author(s):

Ramon Viñas ◽

Tiago Azevedo ◽

Eric R. Gamazon ◽

Pietro Liò

Keyword(s):

Gene Expression ◽

Large Scale ◽

Biological Significance ◽

Predictive Performance ◽

Cost Effective ◽

Rna Seq ◽

Comprehensive Collection ◽

Genomic Studies ◽

Biological Discovery ◽

Cancer Types

AbstractA question of fundamental biological significance is to what extent the expression of a subset of genes can be used to recover the full transcriptome, with important implications for biological discovery and clinical application. To address this challenge, we present GAIN-GTEx, a method for gene expression imputation based on Generative Adversarial Imputation Networks. In order to increase the applicability of our approach, we leverage data from GTEx v8, a reference resource that has generated a comprehensive collection of transcriptomes from a diverse set of human tissues. We compare our model to several standard and state-of-the-art imputation methods and show that GAIN-GTEx is significantly superior in terms of predictive performance and runtime. Furthermore, our results indicate strong generalisation on RNA-Seq data from 3 cancer types across varying levels of missingness. Our work can facilitate a cost-effective integration of large-scale RNA biorepositories into genomic studies of disease, with high applicability across diverse tissue types.

Download Full-text

TISCH: a comprehensive web resource enabling interactive single-cell transcriptome visualization of tumor microenvironment

Nucleic Acids Research ◽

10.1093/nar/gkaa1020 ◽

2020 ◽

Vol 49 (D1) ◽

pp. D1420-D1430

Author(s):

Dongqing Sun ◽

Jin Wang ◽

Ya Han ◽

Xin Dong ◽

Jun Ge ◽

...

Keyword(s):

Gene Expression ◽

Tumor Microenvironment ◽

Single Cell ◽

Large Scale ◽

Differential Expression Analysis ◽

Enrichment Analysis ◽

Functional Enrichment ◽

Multiple Cancer ◽

Web Resource ◽

Cancer Types

Abstract Cancer immunotherapy targeting co-inhibitory pathways by checkpoint blockade shows remarkable efficacy in a variety of cancer types. However, only a minority of patients respond to treatment due to the stochastic heterogeneity of tumor microenvironment (TME). Recent advances in single-cell RNA-seq technologies enabled comprehensive characterization of the immune system heterogeneity in tumors but posed computational challenges on integrating and utilizing the massive published datasets to inform immunotherapy. Here, we present Tumor Immune Single Cell Hub (TISCH, http://tisch.comp-genomics.org), a large-scale curated database that integrates single-cell transcriptomic profiles of nearly 2 million cells from 76 high-quality tumor datasets across 27 cancer types. All the data were uniformly processed with a standardized workflow, including quality control, batch effect removal, clustering, cell-type annotation, malignant cell classification, differential expression analysis and functional enrichment analysis. TISCH provides interactive gene expression visualization across multiple datasets at the single-cell level or cluster level, allowing systematic comparison between different cell-types, patients, tissue origins, treatment and response groups, and even different cancer-types. In summary, TISCH provides a user-friendly interface for systematically visualizing, searching and downloading gene expression atlas in the TME from multiple cancer types, enabling fast, flexible and comprehensive exploration of the TME.

Download Full-text

First Identification of RNA-Binding Proteins That Regulate Alternative Exons in the Dystrophin Gene

International Journal of Molecular Sciences ◽

10.3390/ijms21207803 ◽

2020 ◽

Vol 21 (20) ◽

pp. 7803

Author(s):

Julie Miro ◽

Anne-Laure Bougé ◽

Eva Murauer ◽

Emmanuelle Beyne ◽

Dylan Da Cunha ◽

...

Keyword(s):

Large Scale ◽

Binding Proteins ◽

Molecular Mechanisms ◽

Rna Binding ◽

Rna Binding Proteins ◽

Dystrophin Gene ◽

Rna Seq ◽

Spatio Temporal ◽

Dmd Gene ◽

Shed Light

The Duchenne muscular dystrophy (DMD) gene has a complex expression pattern regulated by multiple tissue-specific promoters and by alternative splicing (AS) of the resulting transcripts. Here, we used an RNAi-based approach coupled with DMD-targeted RNA-seq to identify RNA-binding proteins (RBPs) that regulate splicing of its skeletal muscle isoform (Dp427m) in a human muscular cell line. A total of 16 RBPs comprising the major regulators of muscle-specific splicing events were tested. We show that distinct combinations of RBPs maintain the correct inclusion in the Dp427m of exons that undergo spatio-temporal AS in other dystrophin isoforms. In particular, our findings revealed the complex networks of RBPs contributing to the splicing of the two short DMD exons 71 and 78, the inclusion of exon 78 in the adult Dp427m isoform being crucial for muscle function. Among the RBPs tested, QKI and DDX5/DDX17 proteins are important determinants of DMD exon inclusion. This is the first large-scale study to determine which RBP proteins act on the physiological splicing of the DMD gene. Our data shed light on molecular mechanisms contributing to the expression of the different dystrophin isoforms, which could be influenced by a change in the function or expression level of the identified RBPs.

Download Full-text

RASflow: An RNA-Seq Analysis Workflow with Snakemake

10.1101/839191 ◽

2019 ◽

Author(s):

Xiaokang Zhang ◽

Inge Jonassen

Keyword(s):

Gene Expression ◽

Management System ◽

Workflow Management ◽

Model Organisms ◽

Gene Transcript ◽

Rna Seq ◽

Public Data ◽

Wide Range ◽

Analysis Workflow ◽

Programming Skills

AbstractBackgroundWith the cost of DNA sequencing decreasing, increasing amounts of RNA-Seq data are being generated giving novel insight into gene expression and regulation. Prior to analysis of gene expression, the RNA-Seq data has to be processed through a number of steps resulting in a quantification of expression of each gene / transcript in each of the analyzed samples. A number of workflows are available to help researchers perform these steps on their own data, or on public data to take advantage of novel software or reference data in data re-analysis. However, many of the existing workflows are limited to specific types of studies. We therefore aimed to develop a maximally general workflow, applicable to a wide range of data and analysis approaches and at the same time support research on both model and non-model organisms. Furthermore, we aimed to make the workflow usable also for users with limited programming skills.ResultsUtilizing the workflow management system Snakemake and the package management system Conda, we have developed a modular, flexible and user-friendly RNA-Seq analysis pipeline: RNA-Seq Analysis Snakemake Workflow (RASflow). Utilizing Snakemake and Conda alleviates challenges with library dependencies and version conflicts and also supports reproducibility. To be applicable for a wide variety of applications, RASflow supports mapping of reads to both genomic and transcriptomic assemblies. RASflow has a broad range of potential users: it can be applied by researchers interested in any organism and since it requires no programming skills, it can be used by researchers with different backgrounds. RASflow is an open source tool and source code as well as documentation, tutorials and example data sets can be found on GitHub: https://github.com/zhxiaokang/RASflowConclusionsRASflow is a simple and reliable RNA-Seq analysis workflow which is a full pack of RNA-Seq analysis.

Download Full-text

Pan-cancer analysis reveals complex tumor-specific alternative polyadenylation

10.1101/160960 ◽

2017 ◽

Author(s):

Zhuyi Xue ◽

René L Warren ◽

Ewan A Gibb ◽

Daniel MacMillan ◽

Johnathan Wong ◽

...

Keyword(s):

Alternative Polyadenylation ◽

Length Change ◽

The Cancer Genome Atlas ◽

Cancer Type ◽

Rna Seq ◽

Multiple Cancer ◽

Tissue Samples ◽

Cancer Genome Atlas ◽

Specific Alternative ◽

Cancer Types

AbstractAlternative polyadenylation (APA) of 3’ untranslated regions (3’ UTRs) has been implicated in cancer development. Earlier reports on APA in cancer primarily focused on 3’ UTR length modifications, and the conventional wisdom is that tumor cells preferentially express transcripts with shorter 3’ UTRs. Here, we analyzed the APA patterns of 114 genes, a select list of oncogenes and tumor suppressors, in 9,939 tumor and 729 normal tissue samples across 33 cancer types using RNA-Seq data from The Cancer Genome Atlas, and we found that the APA regulation machinery is much more complicated than what was previously thought. We report 77 cases (gene-cancer type pairs) of differential 3’ UTR cleavage patterns between normal and tumor tissues, involving 33 genes in 13 cancer types. For 15 genes, the tumor-specific cleavage patterns are recurrent across multiple cancer types. While the cleavage patterns in certain genes indicate apparent trends of 3’ UTR shortening in tumor samples, over half of the 77 cases imply 3’ UTR length change trends in cancer that are more complex than simple shortening or lengthening. This work extends the current understanding of APA regulation in cancer, and demonstrates how large volumes of RNA-seq data generated for characterizing cancer cohorts can be mined to investigate this process.

Download Full-text

Histone demethylase UTX regulates glioblastoma progression through affecting periostin expression

10.1101/2021.12.17.473115 ◽

2021 ◽

Author(s):

Yan Luan ◽

Yingfei Liu ◽

Jingwen Xue ◽

Ke Wang ◽

Kaige Ma ◽

...

Keyword(s):

Target Genes ◽

Interaction Analysis ◽

Xenograft Model ◽

Migration And Invasion ◽

Rna Seq ◽

Multiple Cancer ◽

Protein Protein Interaction ◽

Histone H3k27 ◽

Cancer Types ◽

Extracellular Environment

The histone H3K27 demethylase UTX participates in regulating multiple cancer types. However, less is known about the UTX function in glioblastoma (GBM). This study aims to define the effect of UTX on GBM. GEPIA2 database analysis showed that UTX expression was significantly increased in GBM and inversely correlated with survival. Knockdown UTX inhibited GBM cell proliferation, migration, and invasion while promoting apoptosis. Moreover, knockdown UTX also hampered tumor growth in the heterotopic xenograft model. RNA-seq combined with qRT-PCR and ChIP-qPCR were used to identify the target genes. The results showed that the UTX-mediated genes were strongly associated with tumor progression and the extracellular environment. Protein-protein interaction analysis suggested that periostin (POSTN) interacted with most of the other UTX-mediated genes. POSTN supplement abolished the effect of UTX knockdown in GBM cells. Furthermore, silencing UTX exhibited similar antitumor effect in patient-derived glioblastoma stem-like cells, while UTX functions were partially restored after exposing POSTN. Our findings may reveal a new insight into the onset of gliomagenesis and progression, providing a promising therapeutic strategy for GBM treatment.

Download Full-text

Large-scale integration of microarray data reveals genes and pathways common to multiple cancer types

International Journal of Cancer ◽

10.1002/ijc.25854 ◽

2011 ◽

Vol 128 (12) ◽

pp. 2881-2891 ◽

Cited By ~ 23

Author(s):

Noor B. Dawany ◽

Will N. Dampier ◽

Aydin Tozeren

Keyword(s):

Microarray Data ◽

Large Scale ◽

Multiple Cancer ◽

Large Scale Integration ◽

Cancer Types ◽

Scale Integration

Download Full-text

DNA-dependent protein synthesis exhibited by cancer shed particulates

10.1101/121186 ◽

2017 ◽

Author(s):

Vijay K. Ulaganathan ◽

Axel Ullrich

Keyword(s):

Cell Lines ◽

Cancer Cell ◽

Ribosome Biogenesis ◽

Protein Biosynthesis ◽

Protein Translation ◽

Cancer Cell Lines ◽

Biologically Active ◽

Multiple Cancer ◽

Dependent Protein ◽

Cancer Types

AbstractGenetic heterogeneity in tumours is the bonafide hallmark applicable to all cancer types (Burrell et al, 2013). Furthermore, deregulated ribosome biogenesis and elevated protein biosynthesis have been consistently associated with multiple cancer types (Ruggero, 2012; Ruggero & Pandolfi, 2003). We observed that under cultivation conditions almost all cancer cell types actively shed significant amount of particulates as compared to non-malignant cell lines requiring frequent changing of cultivation media. We therefore asked if cancer cell shed particulates might still retain biological activity associated with protein biosynthesis. Here, we communicate our observations of DNA-dependent protein biosynthetic activity exhibited by the cell-free particulates shed by the cancer cell lines. Using pulsed isotope labelling approach we confirmed the cell-free protein translation activity exhibited by particulates shed by various cancer cell lines. Interestingly, the bioactivity was largely dependent on temperature, pH and on 3’-DNA elements. Our results demonstrate that cancer shed particulates are biologically active and may potentially drive expression of tissue non-specific promoters in distant organs.

Download Full-text

Large-scale RNA-Seq Transcriptome Analysis of 4043 Cancers and 548 Normal Tissue Controls across 12 TCGA Cancer Types

Scientific Reports ◽

10.1038/srep13413 ◽

2015 ◽

Vol 5 (1) ◽

Cited By ~ 63

Author(s):

Li Peng ◽

Xiu Wu Bian ◽

Di Kang Li ◽

Chuan Xu ◽

Guang Ming Wang ◽

...

Keyword(s):

Transcriptome Analysis ◽

Normal Tissue ◽

Large Scale ◽

Rna Seq ◽

Cancer Types

Download Full-text

TIMER2.0 for analysis of tumor-infiltrating immune cells

Nucleic Acids Research ◽

10.1093/nar/gkaa407 ◽

2020 ◽

Vol 48 (W1) ◽

pp. W509-W514 ◽

Cited By ~ 39

Author(s):

Taiwen Li ◽

Jingxin Fu ◽

Zexian Zeng ◽

David Cohen ◽

Jing Li ◽

...

Keyword(s):

Immune Cells ◽

Large Scale ◽

Immune Cell ◽

The Cancer Genome Atlas ◽

Multiple Cancer ◽

Public Data ◽

Cancer Genome Atlas ◽

Cancer Types ◽

Transcriptomic Changes ◽

Web Platform

Abstract Tumor progression and the efficacy of immunotherapy are strongly influenced by the composition and abundance of immune cells in the tumor microenvironment. Due to the limitations of direct measurement methods, computational algorithms are often used to infer immune cell composition from bulk tumor transcriptome profiles. These estimated tumor immune infiltrate populations have been associated with genomic and transcriptomic changes in the tumors, providing insight into tumor–immune interactions. However, such investigations on large-scale public data remain challenging. To lower the barriers for the analysis of complex tumor–immune interactions, we significantly improved our previous web platform TIMER. Instead of just using one algorithm, TIMER2.0 (http://timer.cistrome.org/) provides more robust estimation of immune infiltration levels for The Cancer Genome Atlas (TCGA) or user-provided tumor profiles using six state-of-the-art algorithms. TIMER2.0 provides four modules for investigating the associations between immune infiltrates and genetic or clinical features, and four modules for exploring cancer-related associations in the TCGA cohorts. Each module can generate a functional heatmap table, enabling the user to easily identify significant associations in multiple cancer types simultaneously. Overall, the TIMER2.0 web server provides comprehensive analysis and visualization functions of tumor infiltrating immune cells.

Download Full-text