scholarly journals Diagnostic and prognostic roles of IRAK1 in hepatocellular carcinoma tissues: an analysis of immunohistochemistry and RNA-sequencing data from the cancer genome atlas

2017 ◽  
Vol Volume 10 ◽  
pp. 1711-1723 ◽  
Author(s):  
Zhi-hua Ye ◽  
Li Gao ◽  
Dong-yue Wen ◽  
Yun He ◽  
Yu-yan Pang ◽  
...  
2019 ◽  
Author(s):  
William C. Wright ◽  
Taosheng Chen

Abstract Here we obtained RNA-sequencing data from the publicly-available Pan-Cancer analysis project performed by The Cancer Genome Atlas (TCGA). Data within this project were processed the same experimentally, and analyzed downstream by the UCSC Toil recompute project. We reprocessed the resulting gene count files in batch to obtain normalized expression, which is a step critical for proper and comparable interpretation. We describe the linear modeling and normalization protocol, and provide an example of plotting the results using a gene of interest. We perform the entire protocol using freely available packages within the R framework.


2015 ◽  
Vol 31 (22) ◽  
pp. 3666-3672 ◽  
Author(s):  
Mumtahena Rahman ◽  
Laurie K. Jackson ◽  
W. Evan Johnson ◽  
Dean Y. Li ◽  
Andrea H. Bild ◽  
...  

2018 ◽  
Author(s):  
Seo Jeong Shin ◽  
Seng Chan You ◽  
Yu Rang Park ◽  
Jin Roh ◽  
Jang-Hee Kim ◽  
...  

BACKGROUND Clinical sequencing data should be shared in order to achieve the sufficient scale and diversity required to provide strong evidence for improving patient care. A distributed research network allows researchers to share this evidence rather than the patient-level data across centers, thereby avoiding privacy issues. The Observational Medical Outcomes Partnership (OMOP) common data model (CDM) used in distributed research networks has low coverage of sequencing data and does not reflect the latest trends of precision medicine. OBJECTIVE The aim of this study was to develop and evaluate the feasibility of a genomic CDM (G-CDM), as an extension of the OMOP-CDM, for application of genomic data in clinical practice. METHODS Existing genomic data models and sequencing reports were reviewed to extend the OMOP-CDM to cover genomic data. The Human Genome Organisation Gene Nomenclature Committee and Human Genome Variation Society nomenclature were adopted to standardize the terminology in the model. Sequencing data of 114 and 1060 patients with lung cancer were obtained from the Ajou University School of Medicine database of Ajou University Hospital and The Cancer Genome Atlas, respectively, which were transformed to a format appropriate for the G-CDM. The data were compared with respect to gene name, variant type, and actionable mutations. RESULTS The G-CDM was extended into four tables linked to tables of the OMOP-CDM. Upon comparison with The Cancer Genome Atlas data, a clinically actionable mutation, p.Leu858Arg, in the EGFR gene was 6.64 times more frequent in the Ajou University School of Medicine database, while the p.Gly12Xaa mutation in the KRAS gene was 2.02 times more frequent in The Cancer Genome Atlas dataset. The data-exploring tool GeneProfiler was further developed to conduct descriptive analyses automatically using the G-CDM, which provides the proportions of genes, variant types, and actionable mutations. GeneProfiler also allows for querying the specific gene name and Human Genome Variation Society nomenclature to calculate the proportion of patients with a given mutation. CONCLUSIONS We developed the G-CDM for effective integration of genomic data with standardized clinical data, allowing for data sharing across institutes. The feasibility of the G-CDM was validated by assessing the differences in data characteristics between two different genomic databases through the proposed data-exploring tool GeneProfiler. The G-CDM may facilitate analyses of interoperating clinical and genomic datasets across multiple institutions, minimizing privacy issues and enabling researchers to better understand the characteristics of patients and promote personalized medicine in clinical practice.


2018 ◽  
pp. 1-19 ◽  
Author(s):  
Lawrence N. Kwong ◽  
Mariana Petaccia De Macedo ◽  
Lauren Haydu ◽  
Aron Y. Joon ◽  
Michael T. Tetzlaff ◽  
...  

Purpose Initiatives such as The Cancer Genome Atlas and International Cancer Genome Consortium have generated high-quality, multiplatform molecular data from thousands of frozen tumor samples. Although these initiatives have provided invaluable insight into cancer biology, a tremendous potential resource remains largely untapped in formalin-fixed, paraffin-embedded (FFPE) samples that are more readily available but which can present technical challenges because of crosslinking of fragile molecules such as RNA. Materials and Methods We extracted RNA from FFPE primary melanomas and assessed two gene expression platforms—genome-wide RNA sequencing and targeted NanoString—for their ability to generate coherent biologic signals. To do so, we generated an improved approach to quantifying gene expression pathways. We refined pathway scores through correlation-guided gene subsetting. We also make comparisons to The Cancer Genome Atlas and other publicly available melanoma datasets. Results The comparison of the gene expression patterns to each other, to established biologic modules, and to clinical and immunohistochemical data confirmed the fidelity of biologic signals from both platforms using FFPE samples to known biology. Moreover, correlations with patient outcome data were consistent with previous frozen-tissue–based studies. Conclusion FFPE samples from previously difficult-to-access cancer types, such as small primary melanomas, represent a valuable and previously unexploited source of analyte for RNA sequencing and NanoString platforms. This work provides an important step toward the use of such platforms to unlock novel molecular underpinnings and inform future biologically driven clinical decisions.


2011 ◽  
Vol 35 (8) ◽  
pp. 1732-1737 ◽  
Author(s):  
N. Thao T. Nguyen ◽  
Ron T. Cotton ◽  
Theresa R. Harring ◽  
Jacfranz J. Guiteau ◽  
Marie-Claude Gingras ◽  
...  

2015 ◽  
Vol 89 (17) ◽  
pp. 8967-8973 ◽  
Author(s):  
Majid Kazemian ◽  
Min Ren ◽  
Jian-Xin Lin ◽  
Wei Liao ◽  
Rosanne Spolski ◽  
...  

ABSTRACTViruses are causally associated with a number of human malignancies. In this study, we sought to identify new virus-cancer associations by searching RNA sequencing data sets from >2,000 patients, encompassing 21 cancers from The Cancer Genome Atlas (TCGA), for the presence of viral sequences. In agreement with previous studies, we found human papillomavirus 16 (HPV16) and HPV18 in oropharyngeal cancer and hepatitis B and C viruses in liver cancer. Unexpectedly, however, we found HPV38, a cutaneous form of HPV associated with skin cancer, in 32 of 168 samples from endometrial cancer. In 12 of the HPV38-positive (HPV38+) samples, we observed at least one paired read that mapped to both human and HPV38 genomes, indicative of viral integration into the host DNA, something not previously demonstrated for HPV38. The expression levels of HPV38 transcripts were relatively low, and all 32 HPV38+samples belonged to the same experimental batch of 40 samples, whereas none of the other 128 endometrial carcinoma samples were HPV38+, raising doubts about the significance of the HPV38 association. Moreover, the HPV38+samples contained the same 10 novel single nucleotide variations (SNVs), leading us to hypothesize that one patient was infected with this new isolate of HPV38, which was integrated into his/her genome and may have cross-contaminated other TCGA samples within batch 228. Based on our analysis, we propose guidelines to examine the batch effect, virus expression level, and SNVs as part of next-generation sequencing (NGS) data analysis for evaluating the significance of viral/pathogen sequences in clinical samples.IMPORTANCEHigh-throughput RNA sequencing (RNA-Seq), followed by computational analysis, has vastly accelerated the identification of viral and other pathogenic sequences in clinical samples, but cross-contamination during the processing of the samples remain a major problem that can lead to erroneous conclusions. We found HPV38 sequences specifically present in RNA-Seq samples from endometrial cancer patients from TCGA, a virus not previously associated with this type of cancer. However, multiple lines of evidence suggest possible cross-contamination in these samples, which were processed together in the same batch. Despite this potential cross-contamination, our data indicate that we have detected a new isolate of HPV38 that appears to be integrated into the human genome. We also provide general guidelines for computational detection and interpretation of pathogen-disease associations.


2020 ◽  
Author(s):  
Juliet Luft ◽  
Robert S. Young ◽  
Alison M. Meynert ◽  
Martin S. Taylor

AbstractBackgroundThe loss of genetic diversity in segments over a genome (loss-of-heterozygosity, LOH) is a common occurrence in many types of cancer. By analysing patterns of preferential allelic retention during LOH in approximately 10,000 cancer samples from The Cancer Genome Atlas (TCGA), we sought to systematically identify genetic polymorphisms currently segregating in the human population that are preferentially selected for, or against during cancer development.ResultsExperimental batch effects and cross-sample contamination were found to be substantial confounders in this widely used and well studied dataset. To mitigate these we developed a generally applicable classifier (GenomeArtiFinder) to quantify contamination and other abnormalities. We provide these results as a resource to aid further analysis of TCGA whole exome sequencing data. In total, 1,678 pairs of samples (14.7%) were found to be contaminated or affected by systematic experimental error. After filtering, our analysis of LOH revealed an overall trend for biased retention of cancer-associated risk alleles previously identified by genome wide association studies. Analysis of predicted damaging germline variants identified highly significant oncogenic selection for recessive tumour suppressor alleles. These are enriched for biological pathways involved in genome maintenance and stability.ConclusionsOur results identified predicted damaging germline variants in genes responsible for the repair of DNA strand breaks and homologous repair as the most common targets of allele biased LOH. This suggests a ratchet-like process where heterozygous germline mutations in these genes reduce the efficacy of DNA double-strand break repair, increasing the likelihood of a second hit at the locus removing the wild-type allele and triggering an oncogenic mutator phenotype.


Sign in / Sign up

Export Citation Format

Share Document