scholarly journals TCGA2STAT: simple TCGA data access for integrated statistical analysis in R

2015 ◽  
Vol 32 (6) ◽  
pp. 952-954 ◽  
Author(s):  
Ying-Wooi Wan ◽  
Genevera I. Allen ◽  
Zhandong Liu

Abstract Motivation: Massive amounts of high-throughput genomics data profiled from tumor samples were made publicly available by the Cancer Genome Atlas (TCGA). Results: We have developed an open source software package, TCGA2STAT, to obtain the TCGA data, wrangle it, and pre-process it into a format ready for multivariate and integrated statistical analysis in the R environment. In a user-friendly format with one single function call, our package downloads and fully processes the desired TCGA data to be seamlessly integrated into a computational analysis pipeline. No further technical or biological knowledge is needed to utilize our software, thus making TCGA data easily accessible to data scientists without specific domain knowledge. Availability and implementation: TCGA2STAT is available from the https://cran.r-project.org/web/packages/TCGA2STAT/index.html. Supplementary information: Supplementary data are available at Bioinformatics online. Contact: [email protected]

2017 ◽  
Vol 2017 ◽  
pp. 1-7 ◽  
Author(s):  
Chao-Yu Pan ◽  
Wei-Ting Kuo ◽  
Chien-Yuan Chiu ◽  
Wen-chang Lin

MicroRNAs (miRNAs) play important roles in human cancers. In previous studies, we have demonstrated that both 5p-arm and 3p-arm of mature miRNAs could be expressed from the same precursor and we further interrogated the 5p-arm and 3p-arm miRNA expression with a comprehensive arm feature annotation list. To assist biologists to visualize the differential 5p-arm and 3p-arm miRNA expression patterns, we utilized a user-friendly mobile App to display. The Cancer Genome Atlas (TCGA) miRNA-Seq expression information. We have collected over 4,500 miRNA-Seq datasets from 15 TCGA cancer types and further processed them with the 5p-arm and 3p-arm annotation analysis pipeline. In order to be displayed with the RNA-Seq Viewer App, annotated 5p-arm and 3p-arm miRNA expression information and miRNA gene loci information were converted into SQLite tables. In this distinct application, for any given miRNA gene, 5p-arm miRNA is illustrated on the top of chromosome ideogram and 3p-arm miRNA is illustrated on the bottom of chromosome ideogram. Users can then easily interrogate the differentially 5p-arm/3p-arm expressed miRNAs with their mobile devices. This study demonstrates the feasibility and utility of RNA-Seq Viewer App in addition to mRNA-Seq data visualization.


2020 ◽  
Vol 36 (9) ◽  
pp. 2943-2945 ◽  
Author(s):  
Francisco Madrid-Gambin ◽  
Sergio Oller-Moreno ◽  
Luis Fernandez ◽  
Simona Bartova ◽  
Maria Pilar Giner ◽  
...  

Abstract Summary Nuclear magnetic resonance (NMR)-based metabolomics is widely used to obtain metabolic fingerprints of biological systems. While targeted workflows require previous knowledge of metabolites, prior to statistical analysis, untargeted approaches remain a challenge. Computational tools dealing with fully untargeted NMR-based metabolomics are still scarce or not user-friendly. Therefore, we developed AlpsNMR (Automated spectraL Processing System for NMR), an R package that provides automated and efficient signal processing for untargeted NMR metabolomics. AlpsNMR includes spectra loading, metadata handling, automated outlier detection, spectra alignment and peak-picking, integration and normalization. The resulting output can be used for further statistical analysis. AlpsNMR proved effective in detecting metabolite changes in a test case. The tool allows less experienced users to easily implement this workflow from spectra to a ready-to-use dataset in their routines. Availability and implementation The AlpsNMR R package and tutorial is freely available to download from http://github.com/sipss/AlpsNMR under the MIT license. Supplementary information Supplementary data are available at Bioinformatics online.


2019 ◽  
Vol 35 (21) ◽  
pp. 4469-4471 ◽  
Author(s):  
Kristoffer Vitting-Seerup ◽  
Albin Sandelin

Abstract Summary Alternative splicing is an important mechanism involved in health and disease. Recent work highlights the importance of investigating genome-wide changes in splicing patterns and the subsequent functional consequences. Current computational methods only support such analysis on a gene-by-gene basis. Therefore, we extended IsoformSwitchAnalyzeR R library to enable analysis of genome-wide changes in specific types of alternative splicing and predicted functional consequences of the resulting isoform switches. As a case study, we analyzed RNA-seq data from The Cancer Genome Atlas and found systematic changes in alternative splicing and the consequences of the associated isoform switches. Availability and implementation Windows, Linux and Mac OS: http://bioconductor.org/packages/IsoformSwitchAnalyzeR. Supplementary information Supplementary data are available at Bioinformatics online.


2019 ◽  
Vol 12 (1) ◽  
Author(s):  
Yin Li ◽  
Di Ge ◽  
Chunlai Lu

Abstract Background Data mining of The Cancer Genome Atlas (TCGA) data has significantly facilitated cancer genome research and provided unprecedented opportunities for cancer researchers. However, existing web applications for DNA methylation analysis does not adequately address the need of experimental biologists, and many additional functions are often required. Results To facilitate DNA methylation analysis, we present the SMART (Shiny Methylation Analysis Resource Tool) App, a user-friendly and easy-to-use web application for comprehensively analyzing the DNA methylation data of TCGA project. The SMART App integrates multi-omics and clinical data with DNA methylation and provides key interactive and customized functions including CpG visualization, pan-cancer methylation profile, differential methylation analysis, correlation analysis and survival analysis for users to analyze the DNA methylation in diverse cancer types in a multi-dimensional manner. Conclusion The SMART App serves as a new approach for users, especially wet-bench scientists with no programming background, to analyze the scientific big data and facilitate data mining. The SMART App is available at http://www.bioinfo-zs.com/smartapp.


2020 ◽  
Vol 36 (12) ◽  
pp. 3944-3946 ◽  
Author(s):  
Shanyu Chen ◽  
Xiaoyu He ◽  
Ruilin Li ◽  
Xiaohong Duan ◽  
Beifang Niu

Abstract Motivation HotSpot3D is a widely used software for identifying mutation hotspots on the 3D structures of proteins. To further assist users, we developed a new HotSpot3D web server to make this software more versatile, convenient and interactive. Results The HotSpot3D web server performs data pre-processing, clustering, visualization and log-viewing on one stop. Users can interactively explore each cluster and easily re-visualize the mutational clusters within browsers. We also provide a database that allows users to search and visualize proximal mutations from 33 cancers in the Cancer Genome Atlas. Availability and implementation http://niulab.scgrid.cn/HotSpot3D/. Supplementary information Supplementary data are available at Bioinformatics online.


2019 ◽  
Vol 36 (1) ◽  
pp. 241-249 ◽  
Author(s):  
Rudolf Schill ◽  
Stefan Solbrig ◽  
Tilo Wettig ◽  
Rainer Spang

Abstract Motivation Cancer progresses by accumulating genomic events, such as mutations and copy number alterations, whose chronological order is key to understanding the disease but difficult to observe. Instead, cancer progression models use co-occurrence patterns in cross-sectional data to infer epistatic interactions between events and thereby uncover their most likely order of occurrence. State-of-the-art progression models, however, are limited by mathematical tractability and only allow events to interact in directed acyclic graphs, to promote but not inhibit subsequent events, or to be mutually exclusive in distinct groups that cannot overlap. Results Here we propose Mutual Hazard Networks (MHN), a new Machine Learning algorithm to infer cyclic progression models from cross-sectional data. MHN model events by their spontaneous rate of fixation and by multiplicative effects they exert on the rates of successive events. MHN compared favourably to acyclic models in cross-validated model fit on four datasets tested. In application to the glioblastoma dataset from The Cancer Genome Atlas, MHN proposed a novel interaction in line with consecutive biopsies: IDH1 mutations are early events that promote subsequent fixation of TP53 mutations. Availability and implementation Implementation and data are available at https://github.com/RudiSchill/MHN. Supplementary information Supplementary data are available at Bioinformatics online.


2019 ◽  
Author(s):  
Bowen Liu ◽  
Xiaofei Yang ◽  
Tingjie Wang ◽  
Jiadong Lin ◽  
Yongyong Kang ◽  
...  

Abstract Motivation Tumor purity is a fundamental property of each cancer sample and affects downstream investigations. Current tumor purity estimation methods either require matched normal sample or report moderately high tumor purity even on normal samples. It is critical to develop a novel computational approach to estimate tumor purity with sufficient precision based on tumor-only sample. Results In this study, we developed MEpurity, a beta mixture model-based algorithm, to estimate the tumor purity based on tumor-only Illumina Infinium 450k methylation microarray data. We applied MEpurity to both The Cancer Genome Atlas (TCGA) cancer data and cancer cell line data, demonstrating that MEpurity reports low tumor purity on normal samples and comparable results on tumor samples with other state-of-art methods. Availability and implementation MEpurity is a C++ program which is available at https://github.com/xjtu-omics/MEpurity. Supplementary information Supplementary data are available at Bioinformatics online.


2019 ◽  
Vol 35 (17) ◽  
pp. 3140-3142 ◽  
Author(s):  
Marc Streit ◽  
Samuel Gratzl ◽  
Holger Stitz ◽  
Andreas Wernitznig ◽  
Thomas Zichner ◽  
...  

Abstract Summary Ordino is a web-based analysis tool for cancer genomics that allows users to flexibly rank, filter and explore genes, cell lines and tissue samples based on pre-loaded data, including The Cancer Genome Atlas, the Cancer Cell Line Encyclopedia and manually uploaded information. Interactive tabular data visualization that facilitates the user-driven prioritization process forms a core component of Ordino. Detail views of selected items complement the exploration. Findings can be stored, shared and reproduced via the integrated session management. Availability and implementation Ordino is publicly available at https://ordino.caleydoapp.org. The source code is released at https://github.com/Caleydo/ordino under the Mozilla Public License 2.0. Supplementary information Supplementary data are available at Bioinformatics online.


2020 ◽  
Vol 36 (16) ◽  
pp. 4530-4531 ◽  
Author(s):  
Charles E Vejnar ◽  
Antonio J Giraldez

Abstract Summary Experimental laboratory management and data-driven science require centralized software for sharing information, such as lab collections or genomic sequencing datasets. Although database servers such as PostgreSQL can store such information with multiple-user access, they lack user-friendly graphical and programmatic interfaces for easy data access and inputting. We developed LabxDB, a versatile open-source solution for organizing and sharing structured data. We provide several out-of-the-box databases for deployment in the cloud including simple mutant or plasmid collections and purchase-tracking databases. We also developed a high-throughput sequencing (HTS) database, LabxDB seq, dedicated to storage of hierarchical sample annotations. Scientists can import their own or publicly available HTS data into LabxDB seq to manage them from production to publication. Using LabxDB’s programmatic access (REST API), annotations can be easily integrated into bioinformatics pipelines. LabxDB is modular, offering a flexible framework that scientists can leverage to build new database interfaces adapted to their needs. Availability and implementation LabxDB is available at https://gitlab.com/vejnar/labxdb and https://labxdb.vejnar.org for documentation. LabxDB is licensed under the terms of the Mozilla Public License 2.0. Supplementary information Supplementary data are available at Bioinformatics online.


2017 ◽  
Author(s):  
Andrea Rau ◽  
Michael Flister ◽  
Hallgeir Rui ◽  
Paul L. Auer

The Cancer Genome Atlas (TCGA) has greatly advanced cancer research by generating, curating, and publicly releasing deeply measured molecular data from thousands of tumor samples. In particular, gene expression measures, both within and across cancer types, have been used to determine the genes and proteins that are active in tumor cells. To more thoroughly investigate the behavior of gene expression in TCGA tumor samples, we introduce a statistical framework for partitioning the variation in gene expression due to a variety of molecular variables including somatic mutations, transcription factors (TFs), microRNAs, copy number alternations, methylation, and germ-line genetic variation. As proof-of-principle, we identify and validate specific TFs that influence the expression of PTPN14 in breast cancer cells. We provide a freely available, user-friendly, browseable interactive web-based application for exploring the results of our transcriptome-wide analyses across 17 different cancers in TCGA at http://ls-shiny-prod.uwm.edu/edge_in_tcga.


Sign in / Sign up

Export Citation Format

Share Document