scholarly journals Genomic Copy Number Signatures Based Classifiers for Subtype Identification in Cancer

2020 ◽  
Author(s):  
Bo Gao ◽  
Michael Baudis

AbstractCopy number aberrations (CNA) are one of the most important classes of genomic mutations related to oncogenetic effects. In the past three decades, a vast amount of CNA data has been generated by molecular-cytogenetic and genome sequencing based methods. While this data has been instrumental in the identification of cancer-related genes and promoted research into the relation between CNA and histo-pathologically defined cancer types, the heterogeneity of source data and derived CNV profiles pose great challenges for data integration and comparative analysis. Furthermore, a majority of existing studies has been focused on the association of CNA to pre-selected “driver” genes with limited application to rare drivers and other genomic elements.In this study, we developed a bioinformatic pipeline to integrate a collection of 44,988 high-quality CNA profiles of high diversity. Using a hybrid model of neural networks and attention algorithm, we generated the CNA signatures of 31 cancer subtypes, depicting the uniqueness of their respective CNA landscapes. Finally, we constructed a multi-label classifier to identify the cancer type and the organ of origin from copy number profiling data. The investigation of the signatures suggested common patterns, not only of physiologically related cancer types but also of clinico-pathologically distant cancer types such as different cancers originating from the neural crest. Further experiments of classification models confirmed the effectiveness of the signatures in distinguishing different cancer types and demonstrated their potential in tumor classification.

2021 ◽  
Vol 12 ◽  
Author(s):  
Bo Gao ◽  
Michael Baudis

Copy number aberrations (CNA) are one of the most important classes of genomic mutations related to oncogenetic effects. In the past three decades, a vast amount of CNA data has been generated by molecular-cytogenetic and genome sequencing based methods. While this data has been instrumental in the identification of cancer-related genes and promoted research into the relation between CNA and histo-pathologically defined cancer types, the heterogeneity of source data and derived CNV profiles pose great challenges for data integration and comparative analysis. Furthermore, a majority of existing studies have been focused on the association of CNA to pre-selected “driver” genes with limited application to rare drivers and other genomic elements. In this study, we developed a bioinformatics pipeline to integrate a collection of 44,988 high-quality CNA profiles of high diversity. Using a hybrid model of neural networks and attention algorithm, we generated the CNA signatures of 31 cancer subtypes, depicting the uniqueness of their respective CNA landscapes. Finally, we constructed a multi-label classifier to identify the cancer type and the organ of origin from copy number profiling data. The investigation of the signatures suggested common patterns, not only of physiologically related cancer types but also of clinico-pathologically distant cancer types such as different cancers originating from the neural crest. Further experiments of classification models confirmed the effectiveness of the signatures in distinguishing different cancer types and demonstrated their potential in tumor classification.


2016 ◽  
Vol 14 (06) ◽  
pp. 1650031 ◽  
Author(s):  
Ana B. Pavel ◽  
Cristian I. Vasile

Cancer is a complex and heterogeneous genetic disease. Different mutations and dysregulated molecular mechanisms alter the pathways that lead to cell proliferation. In this paper, we explore a method which classifies genes into oncogenes (ONGs) and tumor suppressors. We optimize this method to identify specific (ONGs) and tumor suppressors for breast cancer, lung adenocarcinoma (LUAD), lung squamous cell carcinoma (LUSC) and colon adenocarcinoma (COAD), using data from the cancer genome atlas (TCGA). A set of genes were previously classified as ONGs and tumor suppressors across multiple cancer types (Science 2013). Each gene was assigned an ONG score and a tumor suppressor score based on the frequency of its driver mutations across all variants from the catalogue of somatic mutations in cancer (COSMIC). We evaluate and optimize this approach within different cancer types from TCGA. We are able to determine known driver genes for each of the four cancer types. After establishing the baseline parameters for each cancer type, we identify new driver genes for each cancer type, and the molecular pathways that are highly affected by them. Our methodology is general and can be applied to different cancer subtypes to identify specific driver genes and improve personalized therapy.


2022 ◽  
Author(s):  
Malvika Sudhakar ◽  
Raghunathan Rengaswamy ◽  
Karthik Raman

The progression of tumorigenesis starts with a few mutational and structural driver events in the cell. Various cohort-based computational tools exist to identify driver genes but require a large number of samples to produce reliable results. Many studies use different methods to identify driver mutations/genes from mutations that have no impact on tumour progression; however, a small fraction of patients show no mutational events in any known driver genes. Current unsupervised methods map somatic and expression data onto a network to identify the perturbation in the network. Our method is the first machine learning model to classify genes as tumour suppressor gene (TSG), oncogene (OG) or neutral, thus assigning the functional impact of the gene in the patient. In this study, we develop a multi-omic approach, PIVOT (Personalised Identification of driVer OGs and TSGs), to train on experimentally or computationally validated mutational and structural driver events. Given the lack of any gold standards for the identification of personalised driver genes, we label the data using four strategies and, based on classification metrics, show gene-based labelling strategies perform best. We build different models using SNV, RNA, and multi-omic features to be used based on the data available. Our models trained on multi-omic data improved predictions compared to mutation and expression data, achieving an accuracy >0.99 for BRCA, LUAD and COAD datasets. We show network and expression-based features contribute the most to PIVOT. Our predictions on BRCA, COAD and LUAD cancer types reveal commonly altered genes such as TP53, and PIK3CA, which are predicted drivers for multiple cancer types. Along with known driver genes, our models also identify new driver genes such as PRKCA, SOX9 and PSMD4. Our multi-omic model labels both CNV and mutations with a more considerable contribution by CNV alterations. While predicting labels for genes mutated in multiple samples, we also label rare driver events occurring in as few as one sample. We also identify genes with dual roles within the same cancer type. Overall, PIVOT labels personalised driver genes as TSGs and OGs and also identifies rare driver genes. PIVOT is available at https://github.com/RamanLab/PIVOT.


2021 ◽  
Vol 39 (15_suppl) ◽  
pp. 3072-3072
Author(s):  
Habte Aragaw Yimer ◽  
Wai Hong Wilson Tang ◽  
Mohan K. Tummala ◽  
Spencer Shao ◽  
Gina G. Chung ◽  
...  

3072 Background: The Circulating Cell-free Genome Atlas study (CCGA; NCT02889978) previously demonstrated that a blood-based multi-cancer early detection (MCED) test utilizing cell-free DNA (cfDNA) sequencing in combination with machine learning could detect cancer signals across multiple cancer types and predict cancer signal origin. Cancer classes were defined within the CCGA study for sensitivity reporting. Separately, cancer types defined by the American Joint Committee on Cancer (AJCC) criteria, which outline unique staging requirements and reflect a distinct combination of anatomic site, histology and other biologic features, were assigned to each cancer participant using the same source data for primary site of origin and histologic type. Here, we report CCGA ‘cancer class’ designation and AJCC ‘cancer type’ assignment within the third and final CCGA3 validation substudy to better characterize the diversity of tumors across which a cancer signal could be detected with the MCED test that is nearing clinical availability. Methods: CCGA is a prospective, multicenter, case-control, observational study with longitudinal follow-up (overall population N = 15,254). Plasma cfDNA from evaluable samples was analyzed using a targeted methylation bisulfite sequencing assay and a machine learning approach, and test performance, including sensitivity, was assessed. For sensitivity reporting, CCGA cancer classes were assigned to cancer participants using a combination of the type of primary cancer reported by the site and tumor characteristics abstracted from the site pathology reports by GRAIL pathologists. Each cancer participant also was separately assigned an AJCC cancer type based on the same source data using AJCC staging manual (8th edition) classifications. Results: A total of 4077 participants comprised the independent validation set with confirmed status (cancer: n = 2823; non-cancer: n = 1254 with non-cancer status confirmed at year-one follow-up). Sensitivity was reported for 24 cancer classes (sample sizes ranged from 10 to 524 participants), as well as an “other” cancer class (59 participants). According to AJCC classification, the MCED test was found to detect cancer signals across 50+ AJCC cancer types, including some types not present in the training set; some cancer types had limited representation. Conclusions: This MCED test that is nearing clinical availability and was evaluated in the third CCGA substudy detected cancer signals across 50+ AJCC cancer types. Reporting CCGA cancer classes and AJCC cancer types demonstrates the ability of the MCED test to detect cancer signals across a set of diverse cancer types representing a wide range of biologic characteristics, including cancer types that the classifier has not been trained on, and supports its use on a population-wide scale. Clinical trial information: NCT02889978.


2017 ◽  
Author(s):  
Yun-Ching Chen ◽  
Valer Gotea ◽  
Gennady Margolin ◽  
Laura Elnitski

AbstractRecent evidence shows that mutations in several driver genes can cause aberrant methylation patterns, a hallmark of cancer. In light of these findings, we hypothesized that the landscapes of tumor genomes and epigenomes are tightly interconnected. We measured this relationship using principal component analyses and methylation-mutation associations applied at the nucleotide level and with respect to genome-wide trends. We found a few mutated driver genes were associated with genome-wide patterns of aberrant hypomethylation or CpG island hypermethylation in specific cancer types. We identified associations between 737 mutated driver genes and site-specific methylation changes. Moreover, using these mutation-methylation associations, we were able to distinguish between two uterine and two thyroid cancer subtypes. The driver gene mutation-associated methylation differences between the thyroid cancer subtypes were linked to differential gene expression in JAK-STAT signaling, NADPH oxidation, and other cancer-related pathways. These results establish that driver-gene mutations are associated with methylation alterations capable of shaping regulatory network functions. In addition, the methodology presented here can be used to subdivide tumors into more homogeneous subsets corresponding to their underlying molecular characteristics, which could improve treatment efficacy.Author summaryMutations that alter the function of driver genes by changing DNA nucleotides have been recognized as a key player in cancer progression. Recent evidence showed that DNA methylation, a molecular signature that is used for controlling gene expression and that consists of cytosine residues with attached methyl groups in the context of CG dinucleotides, is also highly dysregulated in cancer and contributes to carcinogenesis. However, whether those methylation alterations correspond to mutated driver genes in cancer remains unclear. In this study, we analyzed 4,302 tumors from 18 cancer types and demonstrated that driver gene mutations are inherently connected with the aberrant DNA methylation landscape in cancer. We showed that those driver gene-associated methylation patterns can classify heterogeneous tumors in a cancer type into homogeneous subtypes and have the potential to influence the genes that contribute to tumor growth. This finding could help us to better understand the fundamental connection between driver gene mutations and DNA methylation alterations in cancer and to further improve the cancer treatment.


Cancers ◽  
2018 ◽  
Vol 10 (12) ◽  
pp. 475 ◽  
Author(s):  
Jihee Soh ◽  
Hyejin Cho ◽  
Chan-Hun Choi ◽  
Hyunju Lee

MicroRNAs (miRNAs) are key molecules that regulate biological processes such as cell proliferation, differentiation, and apoptosis in cancer. Somatic copy number alterations (SCNAs) are common genetic mutations that play essential roles in cancer development. Here, we investigated the association between miRNAs and SCNAs in cancer. We collected 2538 tumor samples for seven cancer types from The Cancer Genome Atlas. We found that 32−84% of miRNAs are in SCNA regions, with the rate depending on the cancer type. In these regions, we identified 80 SCNA-miRNAs whose expression was mainly associated with SCNAs in at least one cancer type and showed that these SCNA-miRNAs are related to cancer by survival analysis and literature searching. We also identified 58 SCNA-miRNAs common in the seven cancer types (CC-SCNA-miRNAs) and showed that these CC-SCNA-miRNAs are more likely to be related with protein and gene expression than other miRNAs. Furthermore, we experimentally validated the oncogenic role of miR-589. In conclusion, our results suggest that SCNA-miRNAs significantly alter biological processes related to cancer development, confirming the importance of SCNAs in non-coding regions in cancer.


2021 ◽  
Vol 13 (1) ◽  
Author(s):  
Joel Nulsen ◽  
Hrvoje Misetic ◽  
Christopher Yau ◽  
Francesca D. Ciccarelli

Abstract Background Identifying the complete repertoire of genes that drive cancer in individual patients is crucial for precision oncology. Most established methods identify driver genes that are recurrently altered across patient cohorts. However, mapping these genes back to patients leaves a sizeable fraction with few or no drivers, hindering our understanding of cancer mechanisms and limiting the choice of therapeutic interventions. Results We present sysSVM2, a machine learning software that integrates cancer genetic alterations with gene systems-level properties to predict drivers in individual patients. Using simulated pan-cancer data, we optimise sysSVM2 for application to any cancer type. We benchmark its performance on real cancer data and validate its applicability to a rare cancer type with few known driver genes. We show that drivers predicted by sysSVM2 have a low false-positive rate, are stable and disrupt well-known cancer-related pathways. Conclusions sysSVM2 can be used to identify driver alterations in patients lacking sufficient canonical drivers or belonging to rare cancer types for which assembling a large enough cohort is challenging, furthering the goals of precision oncology. As resources for the community, we provide the code to implement sysSVM2 and the pre-trained models in all TCGA cancer types (https://github.com/ciccalab/sysSVM2).


2019 ◽  
Author(s):  
Pramod Chandrashekar ◽  
Navid Ahmadinejad ◽  
Junwen Wang ◽  
Aleksandar Sekulic ◽  
Jan B. Egan ◽  
...  

ABSTRACTFunctions of cancer driver genes depend on cellular contexts that vary substantially across tissues and organs. Distinguishing oncogenes (OGs) and tumor suppressor genes (TSGs) for each cancer type is critical to identifying clinically actionable targets. However, current resources for context-aware classifications of cancer drivers are limited. In this study, we show that the direction and magnitude of somatic selection of missense and truncating mutations of a gene are suggestive of its contextual activities. By integrating these features with ratiometric and conservation measures, we developed a computational method to categorize OGs and TSGs using exome sequencing data. This new method, named genes under selection in tumors (GUST) shows an overall accuracy of 0.94 when tested on manually curated benchmarks. Application of GUST to 10,172 tumor exomes of 33 cancer types identified 98 OGs and 179 TSGs, >70% of which promote tumorigenesis in only one cancer type. In broad-spectrum drivers shared across multiple cancer types, we found heterogeneous mutational hotspots modifying distinct functional domains, implicating the synchrony of convergent and divergent disease mechanisms. We further discovered two novel OGs and 28 novel TSGs with high confidence. The GUST program is available at https://github.com/liliulab/gust. A database with pre-computed classifications is available at https://liliulab.shinyapps.io/gust


2021 ◽  
Vol 23 (1) ◽  
pp. 84
Author(s):  
Rubi Campos Gudiño ◽  
Ally C. Farrell ◽  
Nicole M. Neudorf ◽  
Kirk J. McManus

The SKP1, CUL1, F-box protein (SCF) complex represents a family of 69 E3 ubiquitin ligases that poly-ubiquitinate protein substrates marking them for proteolytic degradation via the 26S proteasome. Established SCF complex targets include transcription factors, oncoproteins and tumor suppressors that modulate cell cycle activity and mitotic fidelity. Accordingly, genetic and epigenetic alterations involving SCF complex member genes are expected to adversely impact target regulation and contribute to disease etiology. To gain novel insight into cancer pathogenesis, we determined the prevalence of genetic and epigenetic alterations in six prototypic SCF complex member genes (SKP1, CUL1, RBX1, SKP2, FBXW7 and FBXO5) from patient datasets extracted from The Cancer Genome Atlas (TCGA). Collectively, ~45% of observed SCF complex member mutations are predicted to impact complex structure and/or function in 10 solid tumor types. In addition, the distribution of encoded alterations suggest SCF complex members may exhibit either tumor suppressor or oncogenic mutational profiles in a cancer type dependent manner. Further bioinformatic analyses reveal the potential functional implications of encoded alterations arising from missense mutations by examining predicted deleterious mutations with available crystal structures. The SCF complex also exhibits frequent copy number alterations in a variety of cancer types that generally correspond with mRNA expression levels. Finally, we note that SCF complex member genes are differentially methylated across cancer types, which may effectively phenocopy gene copy number alterations. Collectively, these data show that SCF complex member genes are frequently altered at the genetic and epigenetic levels in many cancer types, which will adversely impact the normal targeting and timely destruction of protein substrates, which may contribute to the development and progression of an extensive array of cancer types.


eLife ◽  
2020 ◽  
Vol 9 ◽  
Author(s):  
Sebastià Franch-Expósito ◽  
Laia Bassaganyas ◽  
Maria Vila-Casadesús ◽  
Eva Hernández-Illán ◽  
Roger Esteban-Fabró ◽  
...  

Somatic copy number alterations (CNAs) are a hallmark of cancer, but their role in tumorigenesis and clinical relevance remain largely unclear. Here, we developed CNApp, a web-based tool that allows a comprehensive exploration of CNAs by using purity-corrected segmented data from multiple genomic platforms. CNApp generates genome-wide profiles, computes CNA scores for broad, focal and global CNA burdens, and uses machine learning-based predictions to classify samples. We applied CNApp to the TCGA pan-cancer dataset of 10,635 genomes showing that CNAs classify cancer types according to their tissue-of-origin, and that each cancer type shows specific ranges of broad and focal CNA scores. Moreover, CNApp reproduces recurrent CNAs in hepatocellular carcinoma and predicts colon cancer molecular subtypes and microsatellite instability based on broad CNA scores and discrete genomic imbalances. In summary, CNApp facilitates CNA-driven research by providing a unique framework to identify relevant clinical implications. CNApp is hosted at https://tools.idibaps.org/CNApp/.


Sign in / Sign up

Export Citation Format

Share Document