scholarly journals WebMeV: a Cloud Platform for Analyzing and Visualizing Cancer Genomic Data

2017 ◽  
Author(s):  
Yaoyu E. Wang ◽  
Lev Kuznetsov ◽  
Antony Partensky ◽  
Jalil Farid ◽  
John Quackenbush

AbstractAlthough large, complex genomic data sets are increasingly easy to generate, and the number of publicly available data sets in cancer and other diseases is rapidly growing, the lack of intuitive, easy to use analysis tools has remained a barrier to the effective use of such data. WebMeV (https://mev.tm4.org) is an open-source, web-based tool that gives users access to sophisticated tools for analysis of RNA-Seq and other data in an interface designed to democratize data access. WebMeV combines cloud-based technologies with a simple user interface to allow users to access large public data sets such as that from The Cancer Genome Atlas (TCGA) or to upload their own. The interface allows users to visualize data and to apply advanced data mining analysis methods to explore the data and draw biologically meaningful conclusions. We provide an overview of WebMeV and demonstrate two simple use cases that illustrate the value of putting data analysis in the hands of those looking to explore the underlying biology of the systems being studied.

2018 ◽  
Author(s):  
Myoung-Eun Han ◽  
Tae Sik Goh ◽  
Dae Cheon Jeong ◽  
Chi-Seung Lee ◽  
Ji-Young Kim ◽  
...  

BACKGROUND Prognostic genes or gene signatures have been widely used to predict patients’ survival and aid the decision of therapeutic options. Although few web-based survival analysis tools to identify them have been developed, they only provide limited information. OBJECTIVE To overcome limitations of previous web-based tools and provide comprehensive survival analysis, we developed GIANT, an online resource for identifying prognostic biomarkers in pan-cancer from The Cancer Genome Atlas (TCGA). METHODS We used R program to code survival analysis based on RNA-seq data from TCGA (n=10,320). To perform survival analyses, we excluded patients and genes that have insufficient information (survival status, tumor stage, age, gender, cancer type, blast count, and histologic grade). The GIANT is programmed by applying appropriate cross validation methods and survival analysis methods to provide three analysis services (survival analysis by single gene, cancer type, variable signature). RESULTS It can perform comprehensive survival analysis to identify prognostic genes or gene signatures with reflecting tumor heterogeneity. Using RNA-seq, clinical data and pathway databases in combination, it provides gene/variable signature by grouped variable selection methods (least absolute shrinkage and selection operator, Elastic Net regularization, Network-Regularized high-dimensional Cox-regression) that has better discriminatory power than single gene. Users also can find prognostic values of gene and statistically significant genes in specific cancer. All results are presented as Kaplan-Meier curve with median/optimal cutoff value, C-index, and area under the curve (AUC) value at t-years. Moreover, users can easily obtain results in the forms of graphs and tables. CONCLUSIONS In conclusion, the GIANT has made it possible to easily perform integrated survival analysis while overcoming the limitations of previous online tools. It will help scientists of those who are vulnerable to computer technology to do database analysis can easily perform comprehensive survival analysis.


2019 ◽  
pp. 1-9 ◽  
Author(s):  
Arunima Shilpi ◽  
Manoj Kandpal ◽  
Yanrong Ji ◽  
Brandon L. Seagle ◽  
Shohreh Shahabi ◽  
...  

PURPOSE Molecular cancer subtyping is an important tool in predicting prognosis and developing novel precision medicine approaches. We developed a novel platform-independent gene expression–based classification system for molecular subtyping of patients with high-grade serous ovarian carcinoma (HGSOC). METHODS Unprocessed exon array (569 tumor and nine normal) and RNA sequencing (RNA-seq; 376 tumor) HGSOC data sets, with clinical annotations, were downloaded from the Genomic Data Commons portal. Sample clustering was performed by non-negative matrix factorization by using isoform-level expression estimates. The association between the subtypes and overall survival was evaluated by Cox proportional hazards regression model after adjusting for the covariates. A novel classification system was developed for HGSOC molecular subtyping. Robustness and generalizability of the gene signatures were validated using independent microarray and RNA-seq data sets. RESULTS Sample clustering recaptured the four known The Cancer Genome Atlas molecular subtypes but switched the subtype for 22% of the cases, which resulted in significant ( P = .006) survival differences among the refined subgroups. After adjusting for covariate effects, the mesenchymal subgroup was found to be at an increased hazard for death compared with the immunoreactive subgroup. Both gene- and isoform-level signatures achieved more than 92% prediction accuracy when tested on independent samples profiled on the exon array platform. When the classifier was applied to RNA-seq data, the subtyping calls agreed with the predictions made from exon array data for 95% of the 279 samples profiled by both platforms. CONCLUSION Isoform-level expression analysis successfully stratifies patients with HGSOC into groups with differing prognosis and has led to the development of robust, platform-independent gene signatures for HGSOC molecular subtyping. The association of the refined The Cancer Genome Atlas HGSOC subtypes with overall survival, independent of covariates, enhances the clinical annotation of the HGSOC cohort.


2018 ◽  
Vol 19 (10) ◽  
pp. 3250 ◽  
Author(s):  
Anna Sorrentino ◽  
Antonio Federico ◽  
Monica Rienzo ◽  
Patrizia Gazzerro ◽  
Maurizio Bifulco ◽  
...  

The PR/SET domain gene family (PRDM) encodes 19 different transcription factors that share a subtype of the SET domain [Su(var)3-9, enhancer-of-zeste and trithorax] known as the PRDF1-RIZ (PR) homology domain. This domain, with its potential methyltransferase activity, is followed by a variable number of zinc-finger motifs, which likely mediate protein–protein, protein–RNA, or protein–DNA interactions. Intriguingly, almost all PRDM family members express different isoforms, which likely play opposite roles in oncogenesis. Remarkably, several studies have described alterations in most of the family members in malignancies. Here, to obtain a pan-cancer overview of the genomic and transcriptomic alterations of PRDM genes, we reanalyzed the Exome- and RNA-Seq public datasets available at The Cancer Genome Atlas portal. Overall, PRDM2, PRDM3/MECOM, PRDM9, PRDM16 and ZFPM2/FOG2 were the most mutated genes with pan-cancer frequencies of protein-affecting mutations higher than 1%. Moreover, we observed heterogeneity in the mutation frequencies of these genes across tumors, with cancer types also reaching a value of about 20% of mutated samples for a specific PRDM gene. Of note, ZFPM1/FOG1 mutations occurred in 50% of adrenocortical carcinoma patients and were localized in a hotspot region. These findings, together with OncodriveCLUST results, suggest it could be putatively considered a cancer driver gene in this malignancy. Finally, transcriptome analysis from RNA-Seq data of paired samples revealed that transcription of PRDMs was significantly altered in several tumors. Specifically, PRDM12 and PRDM13 were largely overexpressed in many cancers whereas PRDM16 and ZFPM2/FOG2 were often downregulated. Some of these findings were also confirmed by real-time-PCR on primary tumors.


F1000Research ◽  
2017 ◽  
Vol 6 ◽  
pp. 319
Author(s):  
Erin K. Wagner ◽  
Satyajeet Raje ◽  
Liz Amos ◽  
Jessica Kurata ◽  
Abhijit S. Badve ◽  
...  

Data sharing is critical to advance genomic research by reducing the demand to collect new data by reusing and combining existing data and by promoting reproducible research. The Cancer Genome Atlas (TCGA) is a popular resource for individual-level genotype-phenotype cancer related data. The Database of Genotypes and Phenotypes (dbGaP) contains many datasets similar to those in TCGA. We have created a software pipeline that will allow researchers to discover relevant genomic data from dbGaP, based on matching TCGA metadata. The resulting research provides an easy to use tool to connect these two data sources.


2017 ◽  
Vol 2017 ◽  
pp. 1-7 ◽  
Author(s):  
Chao-Yu Pan ◽  
Wei-Ting Kuo ◽  
Chien-Yuan Chiu ◽  
Wen-chang Lin

MicroRNAs (miRNAs) play important roles in human cancers. In previous studies, we have demonstrated that both 5p-arm and 3p-arm of mature miRNAs could be expressed from the same precursor and we further interrogated the 5p-arm and 3p-arm miRNA expression with a comprehensive arm feature annotation list. To assist biologists to visualize the differential 5p-arm and 3p-arm miRNA expression patterns, we utilized a user-friendly mobile App to display. The Cancer Genome Atlas (TCGA) miRNA-Seq expression information. We have collected over 4,500 miRNA-Seq datasets from 15 TCGA cancer types and further processed them with the 5p-arm and 3p-arm annotation analysis pipeline. In order to be displayed with the RNA-Seq Viewer App, annotated 5p-arm and 3p-arm miRNA expression information and miRNA gene loci information were converted into SQLite tables. In this distinct application, for any given miRNA gene, 5p-arm miRNA is illustrated on the top of chromosome ideogram and 3p-arm miRNA is illustrated on the bottom of chromosome ideogram. Users can then easily interrogate the differentially 5p-arm/3p-arm expressed miRNAs with their mobile devices. This study demonstrates the feasibility and utility of RNA-Seq Viewer App in addition to mRNA-Seq data visualization.


2015 ◽  
Vol 10 ◽  
pp. BMI.S25132 ◽  
Author(s):  
Jun-ichi Satoh ◽  
Yoshihiro Kino ◽  
Shumpei Niida

Background Alzheimer's disease (AD) is the most common cause of dementia with no curative therapy currently available. Establishment of sensitive and non-invasive biomarkers that promote an early diagnosis of AD is crucial for the effective administration of disease-modifying drugs. MicroRNAs (miRNAs) mediate posttranscriptional repression of numerous target genes. Aberrant regulation of miRNA expression is implicated in AD pathogenesis, and circulating miRNAs serve as potential biomarkers for AD. However, data analysis of numerous AD-specific miRNAs derived from small RNA-sequencing (RNA-Seq) is most often laborious. Methods To identify circulating miRNA biomarkers for AD, we reanalyzed a publicly available small RNA-Seq dataset, composed of blood samples derived from 48 AD patients and 22 normal control (NC) subjects, by a simple web-based miRNA data analysis pipeline that combines omiRas and DIANA miRPath. Results By using omiRas, we identified 27 miRNAs expressed differentially between both groups, including upregulation in AD of miR-26b-3p, miR-28–3p, miR-30c-5p, miR-30d-5p, miR-148b-5p, miR-151a-3p, miR-186–5p, miR-425–5p, miR-550a-5p, miR-1468, miR-4781–3p, miR-5001–3p, and miR-6513–3p and downregulation in AD of let-7a-5p, let-7e-5p, let-7f-5p, let-7g-5p, miR-15a-5p, miR-17–3p, miR-29b-3p, miR-98–5p, miR-144–5p, miR-148a-3p, miR-502–3p, miR-660–5p, miR-1294, and miR-3200–3p. DIANA miRPath indicated that miRNA-regulated pathways potentially down– regulated in AD are linked with neuronal synaptic functions, while those upregulated in AD are implicated in cell survival and cellular communication. Conclusions The simple web-based miRNA data analysis pipeline helps us to effortlessly identify candidates for miRNA biomarkers and pathways of AD from the complex small RNA–Seq data.


2019 ◽  
Vol 39 (9) ◽  
Author(s):  
Claire Lailler ◽  
Christophe Louandre ◽  
Mony Chenda Morisse ◽  
Thomas Lhossein ◽  
Corinne Godin ◽  
...  

Abstract The tumor microenvironment is an important determinant of glioblastoma (GBM) progression and response to treatment. How oncogenic signaling in GBM cells modulates the composition of the tumor microenvironment and its activation is unclear. We aimed to explore the potential local immunoregulatory function of ERK1/2 signaling in GBM. Using proteomic and transcriptomic data (RNA seq) available for GBM tumors from The Cancer Genome Atlas (TCGA), we show that GBM with high levels of phosphorylated ERK1/2 have increased infiltration of tumor-associated macrophages (TAM) with a non-inflammatory M2 polarization. Using three human GBM cell lines in culture, we confirmed the existence of ERK1/2-dependent regulation of the production of the macrophage chemoattractant CCL2/MCP1. In contrast with this positive regulation of TAM recruitment, we found no evidence of a direct effect of ERK1/2 signaling on two other important aspects of TAM regulation by GBM cells: (1) the expression of the immune checkpoint ligands PD-L1 and PD-L2, expressed at high mRNA levels in GBM compared with other solid tumors; (2) the production of the tumor metabolite lactate recently reported to dampen tumor immunity by interacting with the receptor GPR65 present on the surface of TAM. Taken together, our observations suggest that ERK1/2 signaling regulates the recruitment of TAM in the GBM microenvironment. These findings highlight some potentially important particularities of the immune microenvironment in GBM and could provide an explanation for the recent observation that GBM with activated ERK1/2 signaling may respond better to anti-PD1 therapeutics.


2019 ◽  
Vol 35 (21) ◽  
pp. 4469-4471 ◽  
Author(s):  
Kristoffer Vitting-Seerup ◽  
Albin Sandelin

Abstract Summary Alternative splicing is an important mechanism involved in health and disease. Recent work highlights the importance of investigating genome-wide changes in splicing patterns and the subsequent functional consequences. Current computational methods only support such analysis on a gene-by-gene basis. Therefore, we extended IsoformSwitchAnalyzeR R library to enable analysis of genome-wide changes in specific types of alternative splicing and predicted functional consequences of the resulting isoform switches. As a case study, we analyzed RNA-seq data from The Cancer Genome Atlas and found systematic changes in alternative splicing and the consequences of the associated isoform switches. Availability and implementation Windows, Linux and Mac OS: http://bioconductor.org/packages/IsoformSwitchAnalyzeR. Supplementary information Supplementary data are available at Bioinformatics online.


2018 ◽  
Author(s):  
Eric Olivier Audemard ◽  
Patrick Gendron ◽  
Vincent-Philippe Lavallée ◽  
Josée Hébert ◽  
Guy Sauvageau ◽  
...  

AbstractMutations identified in each Acute Myeloid Leukemia (AML) patients are useful for prognosis and to select targeted therapies. Detection of such mutations by the analysis of Next-Generation Sequencing (NGS) data requires a computationally intensive read mapping step and application of several variant calling methods. Targeted mutation identification drastically shifts the usual tradeoff between accuracy and performance by concentrating all computations over a small portion of sequence space. Here, we present km, an efficient approach leveraging k-mer decomposition of reads to identify targeted mutations. Our approach is versatile, as it can detect single-base mutations, several types of insertions and deletions, as well as fusions. We used two independent AML cohorts (The Cancer Genome Atlas and Leucegene), to show that mutation detection bykmis fast, accurate and mainly limited by sequencing depth. Therefore,kmallows to establish fast diagnostics from NGS data, and could be suitable for clinical applications.


2019 ◽  
Author(s):  
Sophia C. Tintori ◽  
Patrick Golden ◽  
Bob Goldstein

AbstractAs the scientific community becomes increasingly interested in data sharing, there is a growing need for tools that facilitate the querying of public data. Mining of RNA-seq datasets, for example, has value to many biomedical researchers, yet is often effectively inaccessible to non-genomicist experts, even when the raw data are available. Here we present DrEdGE (dredge.bio.unc.edu), a free Web-based tool that facilitates data sharing between genomicists and their colleagues. The DrEdGE software guides genomicists through easily creating interactive online data visualizations, which colleagues can then explore and query according to their own conditions to discover genes, samples, or patterns of interest. We demonstrate DrEdGE’s features with three example websites we generated from publicly available datasets—human neuronal tissue, mouse embryonic tissue, and a C. elegans embryonic series. DrEdGE increases the utility of large genomics datasets by removing the technical obstacles that prevent interested parties from exploring the data independently.


Sign in / Sign up

Export Citation Format

Share Document