scholarly journals Reprocessing of RNA-sequencing samples from publicly-available data to yield normalized and comparable expression results.

2019 ◽  
Author(s):  
William C. Wright ◽  
Taosheng Chen

Abstract Here we obtained RNA-sequencing data from the publicly-available Pan-Cancer analysis project performed by The Cancer Genome Atlas (TCGA). Data within this project were processed the same experimentally, and analyzed downstream by the UCSC Toil recompute project. We reprocessed the resulting gene count files in batch to obtain normalized expression, which is a step critical for proper and comparable interpretation. We describe the linear modeling and normalization protocol, and provide an example of plotting the results using a gene of interest. We perform the entire protocol using freely available packages within the R framework.

2015 ◽  
Vol 31 (22) ◽  
pp. 3666-3672 ◽  
Author(s):  
Mumtahena Rahman ◽  
Laurie K. Jackson ◽  
W. Evan Johnson ◽  
Dean Y. Li ◽  
Andrea H. Bild ◽  
...  

2018 ◽  
Vol 17 (2) ◽  
pp. 476-487 ◽  
Author(s):  
Fengju Chen ◽  
Yiqun Zhang ◽  
Sooryanarayana Varambally ◽  
Chad J. Creighton

2018 ◽  
Vol 19 (10) ◽  
pp. 3250 ◽  
Author(s):  
Anna Sorrentino ◽  
Antonio Federico ◽  
Monica Rienzo ◽  
Patrizia Gazzerro ◽  
Maurizio Bifulco ◽  
...  

The PR/SET domain gene family (PRDM) encodes 19 different transcription factors that share a subtype of the SET domain [Su(var)3-9, enhancer-of-zeste and trithorax] known as the PRDF1-RIZ (PR) homology domain. This domain, with its potential methyltransferase activity, is followed by a variable number of zinc-finger motifs, which likely mediate protein–protein, protein–RNA, or protein–DNA interactions. Intriguingly, almost all PRDM family members express different isoforms, which likely play opposite roles in oncogenesis. Remarkably, several studies have described alterations in most of the family members in malignancies. Here, to obtain a pan-cancer overview of the genomic and transcriptomic alterations of PRDM genes, we reanalyzed the Exome- and RNA-Seq public datasets available at The Cancer Genome Atlas portal. Overall, PRDM2, PRDM3/MECOM, PRDM9, PRDM16 and ZFPM2/FOG2 were the most mutated genes with pan-cancer frequencies of protein-affecting mutations higher than 1%. Moreover, we observed heterogeneity in the mutation frequencies of these genes across tumors, with cancer types also reaching a value of about 20% of mutated samples for a specific PRDM gene. Of note, ZFPM1/FOG1 mutations occurred in 50% of adrenocortical carcinoma patients and were localized in a hotspot region. These findings, together with OncodriveCLUST results, suggest it could be putatively considered a cancer driver gene in this malignancy. Finally, transcriptome analysis from RNA-Seq data of paired samples revealed that transcription of PRDMs was significantly altered in several tumors. Specifically, PRDM12 and PRDM13 were largely overexpressed in many cancers whereas PRDM16 and ZFPM2/FOG2 were often downregulated. Some of these findings were also confirmed by real-time-PCR on primary tumors.


2015 ◽  
Vol 6 (1) ◽  
Author(s):  
Rehan Akbani ◽  
Patrick Kwok Shing Ng ◽  
Henrica M.J. Werner ◽  
Maria Shahmoradgoli ◽  
Fan Zhang ◽  
...  

Author(s):  
Xudong Tang ◽  
Mengyan Zhang ◽  
Liang Sun ◽  
Fengyan Xu ◽  
Xin Peng ◽  
...  

Long non-coding RNAs (lncRNAs) play key roles in tumors and function not only as important molecular markers for cancer prognosis, but also as molecular characteristics at the pan-cancer level. Because of the poor prognosis of pancreatic cancer, accurate assessment of prognosis is a key issue in the development of treatment plans for pancreatic cancer. Here we analyzed pancreatic cancer data from The Cancer Genome Atlas and The Genotype Tissue Expression database using Cox regression and lasso regression in analyses using a combination of the two databases as well as only The Cancer Genome Atlas database (Cancer Genome Atlas Research Network et al., 2013). A prognostic risk score model with significant correlation with pancreatic cancer survival was constructed, and two lncRNAs were investigated. Additional analysis of 33 cancers using the two lncRNAs showed that lncRNA TsPOAP1-AS1 was a prognostic marker of seven cancers, among which pancreatic cancer was the most significant, and lncRNA mi600hg was a prognostic marker of ovarian cancer and pancreatic cancer. LncRNA TsPOAP1-AS1 is associated with clinical stage and tumor mutation burden of some cancers as well as a strong degree of immune infiltration in many cancers, while a strong correlation between lncRNA mi600hg and microsatellite instability was observed in several cancers. The results of this study help further our understanding of the different functions of lncRNAs in cancer and may aid in the clinical application of lncRNAs as prognostic factors for cancer.


2018 ◽  
Author(s):  
Seo Jeong Shin ◽  
Seng Chan You ◽  
Yu Rang Park ◽  
Jin Roh ◽  
Jang-Hee Kim ◽  
...  

BACKGROUND Clinical sequencing data should be shared in order to achieve the sufficient scale and diversity required to provide strong evidence for improving patient care. A distributed research network allows researchers to share this evidence rather than the patient-level data across centers, thereby avoiding privacy issues. The Observational Medical Outcomes Partnership (OMOP) common data model (CDM) used in distributed research networks has low coverage of sequencing data and does not reflect the latest trends of precision medicine. OBJECTIVE The aim of this study was to develop and evaluate the feasibility of a genomic CDM (G-CDM), as an extension of the OMOP-CDM, for application of genomic data in clinical practice. METHODS Existing genomic data models and sequencing reports were reviewed to extend the OMOP-CDM to cover genomic data. The Human Genome Organisation Gene Nomenclature Committee and Human Genome Variation Society nomenclature were adopted to standardize the terminology in the model. Sequencing data of 114 and 1060 patients with lung cancer were obtained from the Ajou University School of Medicine database of Ajou University Hospital and The Cancer Genome Atlas, respectively, which were transformed to a format appropriate for the G-CDM. The data were compared with respect to gene name, variant type, and actionable mutations. RESULTS The G-CDM was extended into four tables linked to tables of the OMOP-CDM. Upon comparison with The Cancer Genome Atlas data, a clinically actionable mutation, p.Leu858Arg, in the EGFR gene was 6.64 times more frequent in the Ajou University School of Medicine database, while the p.Gly12Xaa mutation in the KRAS gene was 2.02 times more frequent in The Cancer Genome Atlas dataset. The data-exploring tool GeneProfiler was further developed to conduct descriptive analyses automatically using the G-CDM, which provides the proportions of genes, variant types, and actionable mutations. GeneProfiler also allows for querying the specific gene name and Human Genome Variation Society nomenclature to calculate the proportion of patients with a given mutation. CONCLUSIONS We developed the G-CDM for effective integration of genomic data with standardized clinical data, allowing for data sharing across institutes. The feasibility of the G-CDM was validated by assessing the differences in data characteristics between two different genomic databases through the proposed data-exploring tool GeneProfiler. The G-CDM may facilitate analyses of interoperating clinical and genomic datasets across multiple institutions, minimizing privacy issues and enabling researchers to better understand the characteristics of patients and promote personalized medicine in clinical practice.


2018 ◽  
pp. 1-19 ◽  
Author(s):  
Lawrence N. Kwong ◽  
Mariana Petaccia De Macedo ◽  
Lauren Haydu ◽  
Aron Y. Joon ◽  
Michael T. Tetzlaff ◽  
...  

Purpose Initiatives such as The Cancer Genome Atlas and International Cancer Genome Consortium have generated high-quality, multiplatform molecular data from thousands of frozen tumor samples. Although these initiatives have provided invaluable insight into cancer biology, a tremendous potential resource remains largely untapped in formalin-fixed, paraffin-embedded (FFPE) samples that are more readily available but which can present technical challenges because of crosslinking of fragile molecules such as RNA. Materials and Methods We extracted RNA from FFPE primary melanomas and assessed two gene expression platforms—genome-wide RNA sequencing and targeted NanoString—for their ability to generate coherent biologic signals. To do so, we generated an improved approach to quantifying gene expression pathways. We refined pathway scores through correlation-guided gene subsetting. We also make comparisons to The Cancer Genome Atlas and other publicly available melanoma datasets. Results The comparison of the gene expression patterns to each other, to established biologic modules, and to clinical and immunohistochemical data confirmed the fidelity of biologic signals from both platforms using FFPE samples to known biology. Moreover, correlations with patient outcome data were consistent with previous frozen-tissue–based studies. Conclusion FFPE samples from previously difficult-to-access cancer types, such as small primary melanomas, represent a valuable and previously unexploited source of analyte for RNA sequencing and NanoString platforms. This work provides an important step toward the use of such platforms to unlock novel molecular underpinnings and inform future biologically driven clinical decisions.


2013 ◽  
Vol 45 (10) ◽  
pp. 1113-1120 ◽  
Author(s):  
John N Weinstein ◽  
◽  
Eric A Collisson ◽  
Gordon B Mills ◽  
Kenna R Mills Shaw ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document