A new unsupervised clustering algorithm applied to genome-wide profiles of breast cancers in The Cancer Genome Atlas proper subsets triple-negative samples.

Jason Mezey; Steven Schwager; Sushila Shenoy; Jef Benbanaste; Michael Elashoff; Anuja Kelkar; Pramod Somashekar; David S. Lee

doi:10.1200/jco.2017.35.15_suppl.e23195

A new unsupervised clustering algorithm applied to genome-wide profiles of breast cancers in The Cancer Genome Atlas proper subsets triple-negative samples.

Journal of Clinical Oncology ◽

10.1200/jco.2017.35.15_suppl.e23195 ◽

2017 ◽

Vol 35 (15_suppl) ◽

pp. e23195-e23195

Author(s):

Jason Mezey ◽

Steven Schwager ◽

Sushila Shenoy ◽

Jef Benbanaste ◽

Michael Elashoff ◽

...

Keyword(s):

Clustering Algorithm ◽

Genomic Data ◽

Cancer Genome ◽

Proper Subset ◽

The Cancer Genome Atlas ◽

Driver Mutations ◽

Genome Wide ◽

A Genome ◽

Cancer Genome Atlas ◽

Genome Atlas

e23195 Background: Clustering algorithms have identified subtypes of major cancers from analysis of genome-wide gene expression (GE) and somatic mutation (SM) profiles. These algorithms almost never discover a proper subset cluster, a recovered cluster that includes all the samples of a specific subtype. For breast cancer (BC), clustering of genome-wide profiles has been unable to proper subset triple negatives (TNs), TN subtypes, or other major subtypes. Methods: To search for a proper subset cluster for TNs, we applied a new clustering algorithm to the public domain GE and SM data of BC samples in The Cancer Genome Atlas (TCGA). A module of Medidata’s Clinical Trial Genomics (CTG) platform for automated clinical and genomic data integration and analysis, it uses a hierarchical component with tree learned cut points applied to a principal component dimension reduced similarity matrix calculated from a genome-wide data profile. Results: Our analysis of 540 TCGA BC samples run without human supervision produced a proper subset cluster that included all 55 TN samples and only 74 non-TN samples. GE data have previously indicated TN status, but this is the first demonstration that these TCGA BC data contain enough information to proper subset TNs, implying that this broad BC subtype has a strong, quantifiable impact on GE. We show that the genome-wide SMs of TCGA BC samples can be used to proper subset 4 novel subtypes distinguished as classes “TP53 mutated”, “PIK3CA mutated”, “both TP53 and PIK3CA mutated”, and “neither mutated”, signifying an important role for these known driver mutations in producing the subtypes’ genome-wide mutation profiles. We find that most ( > 80%) TN BCs are in “TP53 mutated” but only 1 TN sample ( < 2%) is in “PIK3CA mutated”, indicating distinct biology for these TNs with potential implications for TN therapy. Conclusions: CTG clustering achieves proper subset cancer subtype clustering of TCGA BC samples. These results illustrate the therapeutic discovery potential possible from genomic data of the high quality present in TCGA if combined with detailed clinical data with the Medidata CTG integration and annotation platform.

Download Full-text

Genomic Common Data Model for Seamless Interoperation of Biomedical Data in Clinical Practice: Retrospective Study (Preprint)

10.2196/preprints.13249 ◽

2018 ◽

Author(s):

Seo Jeong Shin ◽

Seng Chan You ◽

Yu Rang Park ◽

Jin Roh ◽

Jang-Hee Kim ◽

...

Keyword(s):

Clinical Practice ◽

Human Genome ◽

Genomic Data ◽

Cancer Genome ◽

The Cancer Genome Atlas ◽

Common Data Model ◽

Sequencing Data ◽

School Of Medicine ◽

Cancer Genome Atlas ◽

Genome Atlas

BACKGROUND Clinical sequencing data should be shared in order to achieve the sufficient scale and diversity required to provide strong evidence for improving patient care. A distributed research network allows researchers to share this evidence rather than the patient-level data across centers, thereby avoiding privacy issues. The Observational Medical Outcomes Partnership (OMOP) common data model (CDM) used in distributed research networks has low coverage of sequencing data and does not reflect the latest trends of precision medicine. OBJECTIVE The aim of this study was to develop and evaluate the feasibility of a genomic CDM (G-CDM), as an extension of the OMOP-CDM, for application of genomic data in clinical practice. METHODS Existing genomic data models and sequencing reports were reviewed to extend the OMOP-CDM to cover genomic data. The Human Genome Organisation Gene Nomenclature Committee and Human Genome Variation Society nomenclature were adopted to standardize the terminology in the model. Sequencing data of 114 and 1060 patients with lung cancer were obtained from the Ajou University School of Medicine database of Ajou University Hospital and The Cancer Genome Atlas, respectively, which were transformed to a format appropriate for the G-CDM. The data were compared with respect to gene name, variant type, and actionable mutations. RESULTS The G-CDM was extended into four tables linked to tables of the OMOP-CDM. Upon comparison with The Cancer Genome Atlas data, a clinically actionable mutation, p.Leu858Arg, in the EGFR gene was 6.64 times more frequent in the Ajou University School of Medicine database, while the p.Gly12Xaa mutation in the KRAS gene was 2.02 times more frequent in The Cancer Genome Atlas dataset. The data-exploring tool GeneProfiler was further developed to conduct descriptive analyses automatically using the G-CDM, which provides the proportions of genes, variant types, and actionable mutations. GeneProfiler also allows for querying the specific gene name and Human Genome Variation Society nomenclature to calculate the proportion of patients with a given mutation. CONCLUSIONS We developed the G-CDM for effective integration of genomic data with standardized clinical data, allowing for data sharing across institutes. The feasibility of the G-CDM was validated by assessing the differences in data characteristics between two different genomic databases through the proposed data-exploring tool GeneProfiler. The G-CDM may facilitate analyses of interoperating clinical and genomic datasets across multiple institutions, minimizing privacy issues and enabling researchers to better understand the characteristics of patients and promote personalized medicine in clinical practice.

Download Full-text

DNA Methylation–Based Classifier for Accurate Molecular Diagnosis of Bone Sarcomas

JCO Precision Oncology ◽

10.1200/po.17.00031 ◽

2017 ◽

pp. 1-11 ◽

Cited By ~ 2

Author(s):

S. Peter Wu ◽

Benjamin T. Cooper ◽

Fang Bu ◽

Christopher J. Bowman ◽

J. Keith Killian ◽

...

Keyword(s):

Dna Methylation ◽

Synovial Sarcoma ◽

Ewing Sarcoma ◽

Cancer Genome ◽

The Cancer Genome Atlas ◽

Clinical Samples ◽

Cpg Sites ◽

A Genome ◽

Cancer Genome Atlas ◽

Genome Atlas

Purpose Pediatric sarcomas provide a unique diagnostic challenge. There is considerable morphologic overlap between entities, increasing the importance of molecular studies in the diagnosis, treatment, and identification of therapeutic targets. We developed and validated a genome-wide DNA methylation–based classifier to differentiate between osteosarcoma, Ewing sarcoma, and synovial sarcoma. Methods DNA methylation status of 482,421 CpG sites in 10 Ewing sarcoma, 11 synovial sarcoma, and 15 osteosarcoma samples were determined using the Illumina Infinium HumanMethylation450 array. We developed a random forest classifier trained from the 400 most differentially methylated CpG sites within the training set of 36 sarcoma samples. This classifier was validated on data drawn from The Cancer Genome Atlas synovial sarcoma, TARGET-Osteosarcoma, and a recently published series of Ewing sarcoma. Results Methylation profiling revealed three distinct patterns, each enriched with a single sarcoma subtype. Within the validation cohorts, all samples from The Cancer Genome Atlas were accurately classified as synovial sarcoma (10 of 10; sensitivity and specificity, 100%), all but one sample from TARGET-Osteosarcoma were classified as osteosarcoma (85 of 86; sensitivity, 98%; specificity, 100%), and 14 of 15 Ewing sarcoma samples were classified correctly (sensitivity, 93%; specificity, 100%). The single misclassified osteosarcoma sample demonstrated high EWSR1 and ETV1 expression on RNA sequencing, although no fusion was found on manual curation of the transcript sequence. Two additional clinical samples that were difficult to classify by morphology and molecular methods were classified as osteosarcoma; one had been suspected of being a synovial sarcoma and the other of being Ewing sarcoma on initial diagnosis. Conclusion Osteosarcoma, synovial sarcoma, and Ewing sarcoma have distinct epigenetic profiles. Our validated methylation-based classifier can be used to provide diagnostic assistance when histologic and standard techniques are inconclusive.

Download Full-text

Pan-cancer pharmacogenetics: targeted sequencing panels or exome sequencing?

Pharmacogenomics ◽

10.2217/pgs-2020-0035 ◽

2020 ◽

Vol 21 (15) ◽

pp. 1073-1084

Author(s):

Laurentijn Tilleman ◽

Björn Heindryckx ◽

Dieter Deforce ◽

Filip Van Nieuwerburgh

Keyword(s):

Drug Interactions ◽

Exome Sequencing ◽

Informed Choice ◽

Cancer Genome ◽

The Cancer Genome Atlas ◽

Targeted Sequencing ◽

Driver Mutations ◽

Cancer Genome Atlas ◽

Pan Cancer ◽

Genome Atlas

Aim: This study provides clinicians and researchers with an informed choice between current commercially available targeted sequencing panels and exome sequencing panels in the context of pan-cancer pharmacogenetics. Materials & methods: Nine contemporary commercially available targeted pan-cancer panels and the xGen Exome Research Panel v2 were investigated to determine to what extent they cover the pharmacogenetic variant–drug interactions in five available cancer knowledgebases, and the driver mutations and fusion genes in the Cancer Genome Atlas. Results: xGen Exome Research Panel v2 and TrueSight Oncology 500 target 71.0 and 68.9% of the pharmacogenetic interactions in the available knowledgebases; and 93.7 and 86.0% of the driver mutations in the Cancer Genome Atlas, respectively. All other studied panels target lower percentages. Conclusion: Exome sequencing outperforms pan-cancer targeted sequencing panels in terms of covered cancer pharmacogenetic variant–drug interactions and pharmacogenetic cancer variants.

Download Full-text

Exploring cancer genomic data from the cancer genome atlas project

BMB Reports ◽

10.5483/bmbrep.2016.49.11.145 ◽

2016 ◽

Vol 49 (11) ◽

pp. 607-611 ◽

Cited By ~ 25

Author(s):

Ju-Seog Lee

Keyword(s):

Genomic Data ◽

Cancer Genome ◽

The Cancer Genome Atlas ◽

Cancer Genome Atlas ◽

Genome Atlas

Download Full-text

Detecting oncogenic selection through biased allele retention in The Cancer Genome Atlas

10.1101/2020.07.03.186593 ◽

2020 ◽

Author(s):

Juliet Luft ◽

Robert S. Young ◽

Alison M. Meynert ◽

Martin S. Taylor

Keyword(s):

Association Studies ◽

Strand Break ◽

Cancer Genome ◽

The Cancer Genome Atlas ◽

Genome Wide Association Studies ◽

Sequencing Data ◽

Germline Variants ◽

A Genome ◽

Cancer Genome Atlas ◽

Genome Atlas

AbstractBackgroundThe loss of genetic diversity in segments over a genome (loss-of-heterozygosity, LOH) is a common occurrence in many types of cancer. By analysing patterns of preferential allelic retention during LOH in approximately 10,000 cancer samples from The Cancer Genome Atlas (TCGA), we sought to systematically identify genetic polymorphisms currently segregating in the human population that are preferentially selected for, or against during cancer development.ResultsExperimental batch effects and cross-sample contamination were found to be substantial confounders in this widely used and well studied dataset. To mitigate these we developed a generally applicable classifier (GenomeArtiFinder) to quantify contamination and other abnormalities. We provide these results as a resource to aid further analysis of TCGA whole exome sequencing data. In total, 1,678 pairs of samples (14.7%) were found to be contaminated or affected by systematic experimental error. After filtering, our analysis of LOH revealed an overall trend for biased retention of cancer-associated risk alleles previously identified by genome wide association studies. Analysis of predicted damaging germline variants identified highly significant oncogenic selection for recessive tumour suppressor alleles. These are enriched for biological pathways involved in genome maintenance and stability.ConclusionsOur results identified predicted damaging germline variants in genes responsible for the repair of DNA strand breaks and homologous repair as the most common targets of allele biased LOH. This suggests a ratchet-like process where heterozygous germline mutations in these genes reduce the efficacy of DNA double-strand break repair, increasing the likelihood of a second hit at the locus removing the wild-type allele and triggering an oncogenic mutator phenotype.

Download Full-text

Genome-wide analysis of gynecologic cancer: The Cancer Genome Atlas in ovarian and endometrial cancer

Oncology Letters ◽

10.3892/ol.2017.5582 ◽

2017 ◽

Vol 13 (3) ◽

pp. 1063-1070 ◽

Cited By ~ 5

Author(s):

Moito Iijima ◽

Kouji Banno ◽

Ryuichiro Okawa ◽

Megumi Yanokura ◽

Miho Iida ◽

...

Keyword(s):

Endometrial Cancer ◽

Gynecologic Cancer ◽

Cancer Genome ◽

The Cancer Genome Atlas ◽

Genome Wide Analysis ◽

Genome Wide ◽

Cancer Genome Atlas ◽

Genome Atlas

Download Full-text

Special Report: The Cancer Genome Atlas Begins with 3-Year, $100 Million Pilot

PsycEXTRA Dataset ◽

10.1037/e481292006-009 ◽

2005 ◽

Author(s):

Keyword(s):

Cancer Genome ◽

The Cancer Genome Atlas ◽

Special Report ◽

Cancer Genome Atlas ◽

Genome Atlas

Download Full-text

Differences in endometrial cancer molecular portraits based on ethnicity in The Cancer Genome Atlas

10.26226/morressier.59ba7298d462b80296ca20c6 ◽

2017 ◽

Author(s):

David Guttery

Keyword(s):

Endometrial Cancer ◽

Cancer Genome ◽

The Cancer Genome Atlas ◽

Cancer Genome Atlas ◽

Genome Atlas

Download Full-text

Exceptional Chemotherapy Response in Metastatic Colorectal Cancer Associated With Hyper-Indel–Hypermutated Cancer Genome and Comutation of POLD1 and MLH1

JCO Precision Oncology ◽

10.1200/po.16.00015 ◽

2017 ◽

pp. 1-12

Author(s):

Manish R. Sharma ◽

James T. Auman ◽

Nirali M. Patel ◽

Juneko E. Grilley-Olson ◽

Xiaobei Zhao ◽

...

Keyword(s):

Colorectal Cancer ◽

Colon Cancer ◽

Somatic Mutation ◽

Mutation Rate ◽

Germ Line ◽

Cancer Genome ◽

The Cancer Genome Atlas ◽

Response To Chemotherapy ◽

Cancer Genome Atlas ◽

Genome Atlas

Purpose A 73-year-old woman with metastatic colon cancer experienced a complete response to chemotherapy with dose-intensified irinotecan that has been durable for 5 years. We sequenced her tumor and germ line DNA and looked for similar patterns in publicly available genomic data from patients with colorectal cancer. Patients and Methods Tumor DNA was obtained from a biopsy before therapy, and germ line DNA was obtained from blood. Tumor and germline DNA were sequenced using a commercial panel with approximately 250 genes. Whole-genome amplification and exome sequencing were performed for POLE and POLD1. A POLD1 mutation was confirmed by Sanger sequencing. The somatic mutation and clinical annotation data files from the colon (n = 461) and rectal (n = 171) adenocarcinoma data sets were downloaded from The Cancer Genome Atlas data portal and analyzed for patterns of mutations and clinical outcomes in patients with POLE- and/or POLD1-mutated tumors. Results The pattern of alterations included APC biallelic inactivation and microsatellite instability high (MSI-H) phenotype, with somatic inactivation of MLH1 and hypermutation (estimated mutation rate > 200 per megabase). The extremely high mutation rate led us to investigate additional mechanisms for hypermutation, including loss of function of POLE. POLE was unaltered, but a related gene not typically associated with somatic mutation in colon cancer, POLD1, had a somatic mutation c.2171G>A [p.Gly724Glu]. Additionally, we noted that the high mutation rate was largely composed of dinucleotide deletions. A similar pattern of hypermutation (dinucleotide deletions, POLD1 mutations, MSI-H) was found in tumors from The Cancer Genome Atlas. Conclusion POLD1 mutation with associated MSI-H and hyper-indel–hypermutated cancer genome characterizes a previously unrecognized variant of colon cancer that was found in this patient with an exceptional response to chemotherapy.

Download Full-text