scholarly journals Author Correction: The NCI Genomic Data Commons

2021 ◽  
Author(s):  
Allison P. Heath ◽  
Vincent Ferretti ◽  
Stuti Agrawal ◽  
Maksim An ◽  
James C. Angelakos ◽  
...  
Keyword(s):  
2019 ◽  
Author(s):  
Zhenyu Zhang ◽  
Kyle Hernandez ◽  
Jeremiah Savage ◽  
Shenglai Li ◽  
Dan Miller ◽  
...  

AbstractThe goal of the National Cancer Institute (NCI) Genomic Data Commons (GDC) is to provide the cancer research community with a data repository of uniformly processed genomic and associated clinical data that enables data sharing and collaborative analysis in the support of precision medicine. The initial GDC dataset include genomic, epigenomic, proteomic, clinical and other data from the NCI TCGA and TARGET programs. Data production for the GDC started in June, 2015 using an OpenStack-based private cloud. By June of 2016, the GDC had analyzed more than 50,000 raw sequencing data inputs, as well as multiple other data types. Using the latest human genome reference build GRCh38, the GDC generated a variety of data types from aligned reads to somatic mutations, gene expression, miRNA expression, DNA methylation status, and copy number variation. In this paper, we describe the pipelines and workflows used to process and harmonize the data in the GDC. The generated data, as well as the original input files from TCGA and TARGET, are available for download and exploratory analysis at the GDC Data Portal and Legacy Archive (https://gdc.cancer.gov/).


2017 ◽  
Vol 77 (21) ◽  
pp. e15-e18 ◽  
Author(s):  
Shane Wilson ◽  
Michael Fitzsimons ◽  
Martin Ferguson ◽  
Allison Heath ◽  
Mark Jensen ◽  
...  
Keyword(s):  

2021 ◽  
Author(s):  
Allison P. Heath ◽  
Vincent Ferretti ◽  
Stuti Agrawal ◽  
Maksim An ◽  
James C. Angelakos ◽  
...  
Keyword(s):  

2017 ◽  
Author(s):  
Martin T. Morgan ◽  
Sean R. Davis

AbstractThe National Cancer Institute (NCI) Genomic Data Commons (Grossman et al. 2016, https://gdc.cancer.gov/) provides the cancer research community with an open and unified repository for sharing and accessing data across numerous cancer studies and projects via a high-performance data transfer and query infrastructure. The Bioconductor project (Huber et al. 2015) is an open source and open development software project built on the R statistical programming environment (R Core Team 2016). A major goal of the Bioconductor project is to facilitate the use, analysis, and comprehension of genomic data. The GenomicDataCommons Bioconductor package provides basic infrastructure for querying, accessing, and mining genomic datasets available from the GDC. We expect that Bioconductor developer and bioinformatics community will build on the GenomicDataCommons package to add higher-level functionality and expose cancer genomics data to many state-of-the-art bioinformatics methods available in Bioconductor.Availabilityhttps://bioconductor.org/packages/GenomicDataCommons & https://github.com/seandavi/GenomicDataCommons.


Cancer ◽  
2016 ◽  
Vol 122 (18) ◽  
pp. 2777-2778 ◽  
Author(s):  
Carrie Printz

2021 ◽  
Vol 12 (1) ◽  
Author(s):  
Zhenyu Zhang ◽  
Kyle Hernandez ◽  
Jeremiah Savage ◽  
Shenglai Li ◽  
Dan Miller ◽  
...  

AbstractThe goal of the National Cancer Institute’s (NCI’s) Genomic Data Commons (GDC) is to provide the cancer research community with a data repository of uniformly processed genomic and associated clinical data that enables data sharing and collaborative analysis in the support of precision medicine. The initial GDC dataset include genomic, epigenomic, proteomic, clinical and other data from the NCI TCGA and TARGET programs. Data production for the GDC started in June, 2015 using an OpenStack-based private cloud. By June of 2016, the GDC had analyzed more than 50,000 raw sequencing data inputs, as well as multiple other data types. Using the latest human genome reference build GRCh38, the GDC generated a variety of data types from aligned reads to somatic mutations, gene expression, miRNA expression, DNA methylation status, and copy number variation. In this paper, we describe the pipelines and workflows used to process and harmonize the data in the GDC. The generated data, as well as the original input files from TCGA and TARGET, are available for download and exploratory analysis at the GDC Data Portal and Legacy Archive (https://gdc.cancer.gov/).


2017 ◽  
Author(s):  
Ruidong Li ◽  
Han Qu ◽  
Shibo Wang ◽  
Julong Wei ◽  
Le Zhang ◽  
...  

AbstractThe large-scale multidimensional omics data in the Genomic Data Commons (GDC) provides opportunities to investigate the crosstalk among different RNA species and their regulatory mechanisms in cancers. Easy-to-use bioinformatics pipelines are needed to facilitate such studies. We have developed a user-friendly R/Bioconductor package, named GDCRNATools, to facilitate downloading, organizing, and analyzing RNA data in GDC with an emphasis on deciphering the lncRNA-mRNA related competing endogenous RNAs (ceRNAs) regulatory network in cancers. Many widely used bioinformatics tools and databases are utilized in our package. Users can easily pack preferred downstream analysis pipelines or integrate their own pipelines into the workflow. Interactive shiny web apps built in GDCRNATools greatly improve visualization of results from the analysis.AvailabilityGDCRNATools is an R/Bioconductor package that is freely available at https://github.com/Jialab-UCR/GDCRNATools


Cell Systems ◽  
2019 ◽  
Vol 9 (1) ◽  
pp. 24-34.e10 ◽  
Author(s):  
Galen F. Gao ◽  
Joel S. Parker ◽  
Sheila M. Reynolds ◽  
Tiago C. Silva ◽  
Liang-Bo Wang ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document