scholarly journals Genomic Common Data Model for Seamless Interoperation of Biomedical Data in Clinical Practice: Retrospective Study (Preprint)

2018 ◽  
Author(s):  
Seo Jeong Shin ◽  
Seng Chan You ◽  
Yu Rang Park ◽  
Jin Roh ◽  
Jang-Hee Kim ◽  
...  

BACKGROUND Clinical sequencing data should be shared in order to achieve the sufficient scale and diversity required to provide strong evidence for improving patient care. A distributed research network allows researchers to share this evidence rather than the patient-level data across centers, thereby avoiding privacy issues. The Observational Medical Outcomes Partnership (OMOP) common data model (CDM) used in distributed research networks has low coverage of sequencing data and does not reflect the latest trends of precision medicine. OBJECTIVE The aim of this study was to develop and evaluate the feasibility of a genomic CDM (G-CDM), as an extension of the OMOP-CDM, for application of genomic data in clinical practice. METHODS Existing genomic data models and sequencing reports were reviewed to extend the OMOP-CDM to cover genomic data. The Human Genome Organisation Gene Nomenclature Committee and Human Genome Variation Society nomenclature were adopted to standardize the terminology in the model. Sequencing data of 114 and 1060 patients with lung cancer were obtained from the Ajou University School of Medicine database of Ajou University Hospital and The Cancer Genome Atlas, respectively, which were transformed to a format appropriate for the G-CDM. The data were compared with respect to gene name, variant type, and actionable mutations. RESULTS The G-CDM was extended into four tables linked to tables of the OMOP-CDM. Upon comparison with The Cancer Genome Atlas data, a clinically actionable mutation, p.Leu858Arg, in the EGFR gene was 6.64 times more frequent in the Ajou University School of Medicine database, while the p.Gly12Xaa mutation in the KRAS gene was 2.02 times more frequent in The Cancer Genome Atlas dataset. The data-exploring tool GeneProfiler was further developed to conduct descriptive analyses automatically using the G-CDM, which provides the proportions of genes, variant types, and actionable mutations. GeneProfiler also allows for querying the specific gene name and Human Genome Variation Society nomenclature to calculate the proportion of patients with a given mutation. CONCLUSIONS We developed the G-CDM for effective integration of genomic data with standardized clinical data, allowing for data sharing across institutes. The feasibility of the G-CDM was validated by assessing the differences in data characteristics between two different genomic databases through the proposed data-exploring tool GeneProfiler. The G-CDM may facilitate analyses of interoperating clinical and genomic datasets across multiple institutions, minimizing privacy issues and enabling researchers to better understand the characteristics of patients and promote personalized medicine in clinical practice.

2019 ◽  
Author(s):  
William C. Wright ◽  
Taosheng Chen

Abstract Here we obtained RNA-sequencing data from the publicly-available Pan-Cancer analysis project performed by The Cancer Genome Atlas (TCGA). Data within this project were processed the same experimentally, and analyzed downstream by the UCSC Toil recompute project. We reprocessed the resulting gene count files in batch to obtain normalized expression, which is a step critical for proper and comparable interpretation. We describe the linear modeling and normalization protocol, and provide an example of plotting the results using a gene of interest. We perform the entire protocol using freely available packages within the R framework.


2020 ◽  
Author(s):  
Juliet Luft ◽  
Robert S. Young ◽  
Alison M. Meynert ◽  
Martin S. Taylor

AbstractBackgroundThe loss of genetic diversity in segments over a genome (loss-of-heterozygosity, LOH) is a common occurrence in many types of cancer. By analysing patterns of preferential allelic retention during LOH in approximately 10,000 cancer samples from The Cancer Genome Atlas (TCGA), we sought to systematically identify genetic polymorphisms currently segregating in the human population that are preferentially selected for, or against during cancer development.ResultsExperimental batch effects and cross-sample contamination were found to be substantial confounders in this widely used and well studied dataset. To mitigate these we developed a generally applicable classifier (GenomeArtiFinder) to quantify contamination and other abnormalities. We provide these results as a resource to aid further analysis of TCGA whole exome sequencing data. In total, 1,678 pairs of samples (14.7%) were found to be contaminated or affected by systematic experimental error. After filtering, our analysis of LOH revealed an overall trend for biased retention of cancer-associated risk alleles previously identified by genome wide association studies. Analysis of predicted damaging germline variants identified highly significant oncogenic selection for recessive tumour suppressor alleles. These are enriched for biological pathways involved in genome maintenance and stability.ConclusionsOur results identified predicted damaging germline variants in genes responsible for the repair of DNA strand breaks and homologous repair as the most common targets of allele biased LOH. This suggests a ratchet-like process where heterozygous germline mutations in these genes reduce the efficacy of DNA double-strand break repair, increasing the likelihood of a second hit at the locus removing the wild-type allele and triggering an oncogenic mutator phenotype.


2017 ◽  
Vol 35 (15_suppl) ◽  
pp. e23195-e23195
Author(s):  
Jason Mezey ◽  
Steven Schwager ◽  
Sushila Shenoy ◽  
Jef Benbanaste ◽  
Michael Elashoff ◽  
...  

e23195 Background: Clustering algorithms have identified subtypes of major cancers from analysis of genome-wide gene expression (GE) and somatic mutation (SM) profiles. These algorithms almost never discover a proper subset cluster, a recovered cluster that includes all the samples of a specific subtype. For breast cancer (BC), clustering of genome-wide profiles has been unable to proper subset triple negatives (TNs), TN subtypes, or other major subtypes. Methods: To search for a proper subset cluster for TNs, we applied a new clustering algorithm to the public domain GE and SM data of BC samples in The Cancer Genome Atlas (TCGA). A module of Medidata’s Clinical Trial Genomics (CTG) platform for automated clinical and genomic data integration and analysis, it uses a hierarchical component with tree learned cut points applied to a principal component dimension reduced similarity matrix calculated from a genome-wide data profile. Results: Our analysis of 540 TCGA BC samples run without human supervision produced a proper subset cluster that included all 55 TN samples and only 74 non-TN samples. GE data have previously indicated TN status, but this is the first demonstration that these TCGA BC data contain enough information to proper subset TNs, implying that this broad BC subtype has a strong, quantifiable impact on GE. We show that the genome-wide SMs of TCGA BC samples can be used to proper subset 4 novel subtypes distinguished as classes “TP53 mutated”, “PIK3CA mutated”, “both TP53 and PIK3CA mutated”, and “neither mutated”, signifying an important role for these known driver mutations in producing the subtypes’ genome-wide mutation profiles. We find that most ( > 80%) TN BCs are in “TP53 mutated” but only 1 TN sample ( < 2%) is in “PIK3CA mutated”, indicating distinct biology for these TNs with potential implications for TN therapy. Conclusions: CTG clustering achieves proper subset cancer subtype clustering of TCGA BC samples. These results illustrate the therapeutic discovery potential possible from genomic data of the high quality present in TCGA if combined with detailed clinical data with the Medidata CTG integration and annotation platform.


2015 ◽  
Vol 31 (22) ◽  
pp. 3666-3672 ◽  
Author(s):  
Mumtahena Rahman ◽  
Laurie K. Jackson ◽  
W. Evan Johnson ◽  
Dean Y. Li ◽  
Andrea H. Bild ◽  
...  

2017 ◽  
pp. 1-12
Author(s):  
Manish R. Sharma ◽  
James T. Auman ◽  
Nirali M. Patel ◽  
Juneko E. Grilley-Olson ◽  
Xiaobei Zhao ◽  
...  

Purpose A 73-year-old woman with metastatic colon cancer experienced a complete response to chemotherapy with dose-intensified irinotecan that has been durable for 5 years. We sequenced her tumor and germ line DNA and looked for similar patterns in publicly available genomic data from patients with colorectal cancer. Patients and Methods Tumor DNA was obtained from a biopsy before therapy, and germ line DNA was obtained from blood. Tumor and germline DNA were sequenced using a commercial panel with approximately 250 genes. Whole-genome amplification and exome sequencing were performed for POLE and POLD1. A POLD1 mutation was confirmed by Sanger sequencing. The somatic mutation and clinical annotation data files from the colon (n = 461) and rectal (n = 171) adenocarcinoma data sets were downloaded from The Cancer Genome Atlas data portal and analyzed for patterns of mutations and clinical outcomes in patients with POLE- and/or POLD1-mutated tumors. Results The pattern of alterations included APC biallelic inactivation and microsatellite instability high (MSI-H) phenotype, with somatic inactivation of MLH1 and hypermutation (estimated mutation rate > 200 per megabase). The extremely high mutation rate led us to investigate additional mechanisms for hypermutation, including loss of function of POLE. POLE was unaltered, but a related gene not typically associated with somatic mutation in colon cancer, POLD1, had a somatic mutation c.2171G>A [p.Gly724Glu]. Additionally, we noted that the high mutation rate was largely composed of dinucleotide deletions. A similar pattern of hypermutation (dinucleotide deletions, POLD1 mutations, MSI-H) was found in tumors from The Cancer Genome Atlas. Conclusion POLD1 mutation with associated MSI-H and hyper-indel–hypermutated cancer genome characterizes a previously unrecognized variant of colon cancer that was found in this patient with an exceptional response to chemotherapy.


Sign in / Sign up

Export Citation Format

Share Document