scholarly journals Pan-cancer subtyping in a 2D-map shows substructures that are driven by specific combinations of molecular characteristics

2016 ◽  
Vol 6 (1) ◽  
Author(s):  
Erdogan Taskesen ◽  
Sjoerd M. H. Huisman ◽  
Ahmed Mahfouz ◽  
Jesse H. Krijthe ◽  
Jeroen de Ridder ◽  
...  

Abstract The use of genome-wide data in cancer research, for the identification of groups of patients with similar molecular characteristics, has become a standard approach for applications in therapy-response, prognosis-prediction, and drug-development. To progress in these applications, the trend is to move from single genome-wide measurements in a single cancer-type towards measuring several different molecular characteristics across multiple cancer-types. Although current approaches shed light on molecular characteristics of various cancer-types, detailed relationships between patients within cancer clusters are unclear. We propose a novel multi-omic integration approach that exploits the joint behavior of the different molecular characteristics, supports visual exploration of the data by a two-dimensional landscape, and inspection of the contribution of the different genome-wide data-types. We integrated 4,434 samples across 19 cancer-types, derived from TCGA, containing gene expression, DNA-methylation, copy-number variation and microRNA expression data. Cluster analysis revealed 18 clusters, where three clusters showed a complex collection of cancer-types, squamous-cell-carcinoma, colorectal cancers, and a novel grouping of kidney-cancers. Sixty-four samples were identified outside their tissue-of-origin cluster. Known and novel patient subgroups were detected for Acute Myeloid Leukemia’s, and breast cancers. Quantification of the contributions of the different molecular types showed that substructures are driven by specific (combinations of) molecular characteristics.

2022 ◽  
Author(s):  
Malvika Sudhakar ◽  
Raghunathan Rengaswamy ◽  
Karthik Raman

The progression of tumorigenesis starts with a few mutational and structural driver events in the cell. Various cohort-based computational tools exist to identify driver genes but require a large number of samples to produce reliable results. Many studies use different methods to identify driver mutations/genes from mutations that have no impact on tumour progression; however, a small fraction of patients show no mutational events in any known driver genes. Current unsupervised methods map somatic and expression data onto a network to identify the perturbation in the network. Our method is the first machine learning model to classify genes as tumour suppressor gene (TSG), oncogene (OG) or neutral, thus assigning the functional impact of the gene in the patient. In this study, we develop a multi-omic approach, PIVOT (Personalised Identification of driVer OGs and TSGs), to train on experimentally or computationally validated mutational and structural driver events. Given the lack of any gold standards for the identification of personalised driver genes, we label the data using four strategies and, based on classification metrics, show gene-based labelling strategies perform best. We build different models using SNV, RNA, and multi-omic features to be used based on the data available. Our models trained on multi-omic data improved predictions compared to mutation and expression data, achieving an accuracy >0.99 for BRCA, LUAD and COAD datasets. We show network and expression-based features contribute the most to PIVOT. Our predictions on BRCA, COAD and LUAD cancer types reveal commonly altered genes such as TP53, and PIK3CA, which are predicted drivers for multiple cancer types. Along with known driver genes, our models also identify new driver genes such as PRKCA, SOX9 and PSMD4. Our multi-omic model labels both CNV and mutations with a more considerable contribution by CNV alterations. While predicting labels for genes mutated in multiple samples, we also label rare driver events occurring in as few as one sample. We also identify genes with dual roles within the same cancer type. Overall, PIVOT labels personalised driver genes as TSGs and OGs and also identifies rare driver genes. PIVOT is available at https://github.com/RamanLab/PIVOT.


2017 ◽  
Author(s):  
Zhuyi Xue ◽  
René L Warren ◽  
Ewan A Gibb ◽  
Daniel MacMillan ◽  
Johnathan Wong ◽  
...  

AbstractAlternative polyadenylation (APA) of 3’ untranslated regions (3’ UTRs) has been implicated in cancer development. Earlier reports on APA in cancer primarily focused on 3’ UTR length modifications, and the conventional wisdom is that tumor cells preferentially express transcripts with shorter 3’ UTRs. Here, we analyzed the APA patterns of 114 genes, a select list of oncogenes and tumor suppressors, in 9,939 tumor and 729 normal tissue samples across 33 cancer types using RNA-Seq data from The Cancer Genome Atlas, and we found that the APA regulation machinery is much more complicated than what was previously thought. We report 77 cases (gene-cancer type pairs) of differential 3’ UTR cleavage patterns between normal and tumor tissues, involving 33 genes in 13 cancer types. For 15 genes, the tumor-specific cleavage patterns are recurrent across multiple cancer types. While the cleavage patterns in certain genes indicate apparent trends of 3’ UTR shortening in tumor samples, over half of the 77 cases imply 3’ UTR length change trends in cancer that are more complex than simple shortening or lengthening. This work extends the current understanding of APA regulation in cancer, and demonstrates how large volumes of RNA-seq data generated for characterizing cancer cohorts can be mined to investigate this process.


2020 ◽  
Author(s):  
Nadav Brandes ◽  
Nathan Linial ◽  
Michal Linial

AbstractThe characterization of germline genetic variation affecting cancer risk, known as cancer predisposition, is fundamental to preventive and personalized medicine. Current attempts to detect cancer predisposition genomic regions are typically based on small-scale familial studies or genome-wide association studies (GWAS) over dedicated case-control cohorts. In this study, we utilized the UK Biobank as a large-scale prospective cohort to conduct a comprehensive analysis of cancer predisposition using both GWAS and proteome-wide association study (PWAS), a method that highlights genetic associations mediated by functional alterations to protein-coding genes. We discovered 137 unique genomic loci implicated with cancer risk in the white British population across nine cancer types and pan-cancer. While most of these genomic regions are supported by external evidence, our results highlight novel loci as well. We performed a comparative analysis of cancer predisposition between cancer types, finding that most of the implicated regions are cancer-type specific. We further analyzed the role of recessive genetic effects in cancer predisposition. We found that 30 of the 137 cancer regions were recovered only by a recessive model, highlighting the importance of recessive inheritance outside of familial studies. Finally, we show that many of the cancer associations exert substantial cancer risk in the studied cohort, suggesting their clinical relevance.


2019 ◽  
Vol 145 (12) ◽  
pp. 3425-3435 ◽  
Author(s):  
Dimitrios Mathios ◽  
Taeyoung Hwang ◽  
Yuanxuan Xia ◽  
Jillian Phallen ◽  
Yuan Rui ◽  
...  

2019 ◽  
Vol 37 (15_suppl) ◽  
pp. 3049-3049 ◽  
Author(s):  
Minetta C. Liu ◽  
Arash Jamshidi ◽  
Oliver Venn ◽  
Alexander P. Fields ◽  
M. Cyrus Maher ◽  
...  

3049 Background: For multi-cancer detection using cfDNA, TOO determination is critical to enable safe and efficient diagnostic follow-up. Previous array-based studies captured < 2% of genomic CpGs. Here, we report genome-wide fragment-level methylation patterns across 811 cancer cell methylomes representing 21 tumor types (97% of SEER cancer incidence), and define effects of this methylation database on TOO prediction within a machine learning framework. Methods: Genomic DNA from 655 formalin-fixed paraffin-embedded (FFPE) tumor tissues and 156 isolated cells from tumors was subjected to a prototype 30x whole-genome bisulfite sequencing (WGBS) assay, as previously reported in the Circulating Cell-free Genome Atlas (CCGA) study (NCT02889978). Two independent TOO models, one with and one without the methylation database, were fitted on training samples; each was used to predict on the test set. A WGBS classifier was used to detect cancer at 98% specificity; reported TOO results reflect percent agreement between predicted and true TOO among those detected cancers (166 cases: 81 stage I-III, 69 stage IV, 16 non-informative). Results: Genome-wide methylation data generated from this database allowed fragment-level analysis and coverage of ~30 million CpGs across the genome (~60-fold greater than array-based approaches). Incorrect TOO assignments decreased by 35% (20% to 13%) after incorporating methylation database information into TOO classification. Improvement was observed across all cancer types and was consistent in early-stage cancers (stage I-III). Respective performances in breast cancer (n = 23) were 87% vs 96%; in lung cancer (n = 32) were 85% vs 88%; in hepatobiliary (n = 10) were 70% vs 90%; and in pancreatic cancer (n = 17) were 94% vs 100%. Results using an optimized approach informed by these results in a large cohort of CCGA participants will be reported. Conclusions: Incorporating data from a large methylation database improved TOO performance in multiple cancer types. This supports feasibility of this methylation-based approach as an early cancer detection test across cancer types. Clinical trial information: NCT02889978.


2019 ◽  
Vol 37 (15_suppl) ◽  
pp. e14553-e14553
Author(s):  
Gordon Vansant ◽  
Adam Jendrisak ◽  
Ramsay Sutton ◽  
Sarah Orr ◽  
David Lu ◽  
...  

e14553 Background: Different cancers subtypes can often be effectively treated with similar Rx classes (i.e. platinum or taxane Rx). Yet, within a disease patient therapy benefit can be variable. The origins of precision medicine derive from pathologic sub-stratification to guide therapy (e.g. SCLC vs. NSCLC). Using the Epic Sciences platform, we performed FPC analysis of ~100,000 single CTCs from multiple indications and sought to utilize high resolution digital pathology and machine learning to index metastatic cancers for the purpose of improving our understanding of therapy response and precision medicine. Methods: 92,300 CTCs underwent FCP analysis (single cell digital pathology features of cellular and sub-cellular morphometrics) were collected from prostate (1641 pts, 70,747 CTCs), breast (268 pts, 8,718 CTCs), NSCLC ( 110 pts, 1884 CTCs), SCLC ( 141 pts, 8,872 CTCs) and bladder (65 pts, 2079 CTCs) cancer pts. After pre-processing the raw data, a training set was balanced by sampling the same number of CTCs from each indication. K-means clustering was applied on the training set and optimized number of clusters were determined by using the elbow approach. After generating the clusters on the training set, the cluster centers were extracted from k-means, and used to train a k-Nearest Neighbor (k-NN) classifier to predict the cluster assignment for the remaining CTCs (test set). Results: The optimized # of clusters was 9. The % and characteristics of CTCs in each indication are listed below. BCa CTCs were more enriched in cluster c1, which had higher CK expression, while SCLC and some of mCRPC shared the small cell features (c5). Conclusions: Heterogeneous CTC phenotypic subtypes were observed across multiple indications. Each indication harbored subtype heterogeneity and shared clusters with other disease subtypes. Patient cluster subtype analysis to prognosis and therapy benefit are on-going. Analysis of linking of CTC subtypes genotypes (by single cell sequencing) and to patient survival on multiple indications is ongoing.[Table: see text]


2012 ◽  
Vol 11 ◽  
pp. CIN.S9037 ◽  
Author(s):  
Bill Andreopoulos ◽  
Dimitris Anastassiou

Gene expression profiling has provided insights into different cancer types and revealed tissue-specific expression signatures. Alterations in microRNA expression contribute to the pathogenesis of many types of human diseases. Few studies have integrated all levels of gene expression, miRNA and methylation to uncover correlations between these data types. We performed an integrated profiling to discover instances of miRNAs associated with a gene expression and DNA methylation signature across multiple cancer types. Using data from The Cancer Genome Atlas (TCGA), we revealed a concordant gene expression and methylation signature associated with the microRNA hsa-miR-142 across the same samples. In all cancer types examined, we found a signature of co-expression of a gene set R and methylated sites M, which correlate positively (M+) or negatively (M–) with the expression of hsa-miR-142. The set R consistently contains many genes, such as TRAF3IP3, NCKAP1L, CD53, LAPTM5, PTPRC, EVI2B, DOCK2, LCP2, CYBB and FYB. The signature is preserved across glioblastoma, ovarian, breast, colon, kidney, lung, uterine and rectum cancer. There is 28% overlap of methylation sites in M between glioblastoma (GBM) and ovarian cancer. There is 60% overlap of genes in R between GBM and ovarian ( P = 1.3e−-11). Most of the genes in R are known to be expressed in lymphocytes and haematopoietic stem cells, while M reflects membrane proteins involved in cell-cell adhesion functions. We speculate that the hsa-miR-142 associated signature may signal haematopoietic-specific processes and an accumulation of methylation events triggering a progressive loss of cell-cell adhesion. We also observed that GBM samples belonging to the proneural subtype tend to have underexpressed hsa-miR-142 and R genes, hypomethylated M+ and hypermethylated M–, while the mesenchymal samples have the opposite profile.


Genes ◽  
2019 ◽  
Vol 10 (8) ◽  
pp. 604 ◽  
Author(s):  
Wang ◽  
Wu ◽  
Ma

Prognosis modeling plays an important role in cancer studies. With the development of omics profiling, extensive research has been conducted to search for prognostic markers for various cancer types. However, many of the existing studies share a common limitation by only focusing on a single cancer type and suffering from a lack of sufficient information. With potential molecular similarity across cancer types, one cancer type may contain information useful for the analysis of other types. The integration of multiple cancer types may facilitate information borrowing so as to more comprehensively and more accurately describe prognosis. In this study, we conduct marginal and joint integrative analysis of multiple cancer types, effectively introducing integration in the discovery process. For accommodating high dimensionality and identifying relevant markers, we adopt the advanced penalization technique which has a solid statistical ground. Gene expression data on nine cancer types from The Cancer Genome Atlas (TCGA) are analyzed, leading to biologically sensible findings that are different from the alternatives. Overall, this study provides a novel venue for cancer prognosis modeling by integrating multiple cancer types.


2016 ◽  
Vol 14 (06) ◽  
pp. 1650031 ◽  
Author(s):  
Ana B. Pavel ◽  
Cristian I. Vasile

Cancer is a complex and heterogeneous genetic disease. Different mutations and dysregulated molecular mechanisms alter the pathways that lead to cell proliferation. In this paper, we explore a method which classifies genes into oncogenes (ONGs) and tumor suppressors. We optimize this method to identify specific (ONGs) and tumor suppressors for breast cancer, lung adenocarcinoma (LUAD), lung squamous cell carcinoma (LUSC) and colon adenocarcinoma (COAD), using data from the cancer genome atlas (TCGA). A set of genes were previously classified as ONGs and tumor suppressors across multiple cancer types (Science 2013). Each gene was assigned an ONG score and a tumor suppressor score based on the frequency of its driver mutations across all variants from the catalogue of somatic mutations in cancer (COSMIC). We evaluate and optimize this approach within different cancer types from TCGA. We are able to determine known driver genes for each of the four cancer types. After establishing the baseline parameters for each cancer type, we identify new driver genes for each cancer type, and the molecular pathways that are highly affected by them. Our methodology is general and can be applied to different cancer subtypes to identify specific driver genes and improve personalized therapy.


2020 ◽  
Vol 6 ◽  
pp. e251 ◽  
Author(s):  
Zhaodong Hao ◽  
Dekang Lv ◽  
Ying Ge ◽  
Jisen Shi ◽  
Dolf Weijers ◽  
...  

Background Owing to the rapid advances in DNA sequencing technologies, whole genome from more and more species are becoming available at increasing pace. For whole-genome analysis, idiograms provide a very popular, intuitive and effective way to map and visualize the genome-wide information, such as GC content, gene and repeat density, DNA methylation distribution, genomic synteny, etc. However, most available software programs and web servers are available only for a few model species, such as human, mouse and fly, or have limited application scenarios. As more and more non-model species are sequenced with chromosome-level assembly being available, tools that can generate idiograms for a broad range of species and be capable of visualizing more data types are needed to help better understanding fundamental genome characteristics. Results The R package RIdeogram allows users to build high-quality idiograms of any species of interest. It can map continuous and discrete genome-wide data on the idiograms and visualize them in a heat map and track labels, respectively. Conclusion The visualization of genome-wide data mapping and comparison allow users to quickly establish a clear impression of the chromosomal distribution pattern, thus making RIdeogram a useful tool for any researchers working with omics.


Sign in / Sign up

Export Citation Format

Share Document