The Cancer Genomic Atlas – “TO CONQUER CANCER”

The Cancer Genomic Atlas (TCGA) is a publicly accessible cancer data repository and tool that allows us to understand the molecular basis of cancer through the application of genomics and proteomics. So far, researchers have been able to diagnose 33 cancer types including 10 rare cancer types. The key features of TCGA are to make the data collection process publicly accessible for the better understanding of the molecular and genetic basis of cancer and its mechanism of action along with its prevention. Studies on different cancer types along with comprehensive pan cancer analysis have expanded the understanding and purpose of TCGA. Ever since its’ conceptualization, its’ high-throughput approach has provided a platform for the identification of genes and pathways involved in cancers and accurate classification of cancers.

Download Full-text

Pan-cancer detection of driver genes at the single-patient resolution

Genome Medicine ◽

10.1186/s13073-021-00830-0 ◽

2021 ◽

Vol 13 (1) ◽

Author(s):

Joel Nulsen ◽

Hrvoje Misetic ◽

Christopher Yau ◽

Francesca D. Ciccarelli

Keyword(s):

False Positive Rate ◽

Genetic Alterations ◽

Therapeutic Interventions ◽

Precision Oncology ◽

Cancer Type ◽

Rare Cancer ◽

Driver Genes ◽

Cancer Data ◽

Cancer Types ◽

Pan Cancer

Abstract Background Identifying the complete repertoire of genes that drive cancer in individual patients is crucial for precision oncology. Most established methods identify driver genes that are recurrently altered across patient cohorts. However, mapping these genes back to patients leaves a sizeable fraction with few or no drivers, hindering our understanding of cancer mechanisms and limiting the choice of therapeutic interventions. Results We present sysSVM2, a machine learning software that integrates cancer genetic alterations with gene systems-level properties to predict drivers in individual patients. Using simulated pan-cancer data, we optimise sysSVM2 for application to any cancer type. We benchmark its performance on real cancer data and validate its applicability to a rare cancer type with few known driver genes. We show that drivers predicted by sysSVM2 have a low false-positive rate, are stable and disrupt well-known cancer-related pathways. Conclusions sysSVM2 can be used to identify driver alterations in patients lacking sufficient canonical drivers or belonging to rare cancer types for which assembling a large enough cohort is challenging, furthering the goals of precision oncology. As resources for the community, we provide the code to implement sysSVM2 and the pre-trained models in all TCGA cancer types (https://github.com/ciccalab/sysSVM2).

Download Full-text

Pan-cancer detection of driver genes at the single-patient resolution

10.1101/2020.06.12.147983 ◽

2020 ◽

Author(s):

Joel Nulsen ◽

Hrvoje Misetic ◽

Christopher Yau ◽

Francesca D. Ciccarelli

Keyword(s):

False Positive Rate ◽

Genetic Alterations ◽

Therapeutic Interventions ◽

Precision Oncology ◽

Cancer Type ◽

Rare Cancer ◽

Driver Genes ◽

Cancer Data ◽

Cancer Types ◽

Pan Cancer

ABSTRACTBackgroundIdentifying the complete repertoire of genes that drive cancer in individual patients is crucial for precision oncology. Most established methods identify driver genes that are recurrently altered across patient cohorts. However, mapping these genes back to patients leaves a sizeable fraction with few or no drivers, hindering our understanding of cancer mechanisms and limiting the choice of therapeutic interventions.ResultsWe present sysSVM2, a machine learning software that integrates cancer genetic alterations with gene systems-level properties to predict drivers in individual patients. Using simulated pan-cancer data, we optimise sysSVM2 for application to any cancer type. We benchmark its performance on real cancer data and validate its applicability to a rare cancer type with few known driver genes. We show that drivers predicted by sysSVM2 have a low false-positive rate, are stable and disrupt well-known cancer-related pathways.ConclusionssysSVM2 can be used to identify driver alterations in patients lacking sufficient canonical drivers or belonging to rare cancer types for which assembling a large enough cohort is challenging, furthering the goals of precision oncology. As resources for the community, we provide the code to implement sysSVM2 and the pre-trained models in all TCGA cancer types (https://github.com/ciccalab/sysSVM2).

Download Full-text

Identification of pan-cancer Ras pathway activation with deep learning

Briefings in Bioinformatics ◽

10.1093/bib/bbaa258 ◽

2020 ◽

Author(s):

Xiangtao Li ◽

Shaochuan Li ◽

Yunhe Wang ◽

Shixiong Zhang ◽

Ka-Chun Wong

Keyword(s):

Deep Learning ◽

Superior Performance ◽

Recent Attempt ◽

Precision Oncology ◽

Pathway Activity ◽

Ras Pathway ◽

Cancer Data ◽

Pathway Activation ◽

Cancer Types ◽

Pan Cancer

Abstract The identification of hidden responders is often an essential challenge in precision oncology. A recent attempt based on machine learning has been proposed for classifying aberrant pathway activity from multiomic cancer data. However, we note several critical limitations there, such as high-dimensionality, data sparsity and model performance. Given the central importance and broad impact of precision oncology, we propose nature-inspired deep Ras activation pan-cancer (NatDRAP), a deep neural network (DNN) model, to address those restrictions for the identification of hidden responders. In this study, we develop the nature-inspired deep learning model that integrates bulk RNA sequencing, copy number and mutation data from PanCanAltas to detect pan-cancer Ras pathway activation. In NatDRAP, we propose to synergize the nature-inspired artificial bee colony algorithm with different gradient-based optimizers in one framework for optimizing DNNs in a collaborative manner. Multiple experiments were conducted on 33 different cancer types across PanCanAtlas. The experimental results demonstrate that the proposed NatDRAP can provide superior performance over other benchmark methods with strong robustness towards diagnosing RAS aberrant pathway activity across different cancer types. In addition, gene ontology enrichment and pathological analysis are conducted to reveal novel insights into the RAS aberrant pathway activity identification and characterization. NatDRAP is written in Python and available at https://github.com/lixt314/NatDRAP1.

Download Full-text

Reconstructing and characterizing focal amplifications in cancer using AmpliconArchitect

10.1101/457333 ◽

2018 ◽

Cited By ~ 1

Author(s):

Viraj Deshpande ◽

Jens Luebeck ◽

Mehrdad Bakhtiari ◽

Nam-Phuong D Nguyen ◽

Kristen M Turner ◽

...

Keyword(s):

Cervical Cancer ◽

Fine Structure ◽

Tumor Growth ◽

Genome Sequencing ◽

Copy Number ◽

Whole Genome ◽

Multiple Cancer ◽

Cancer Data ◽

Cancer Types ◽

Pan Cancer

AbstractFocal oncogene amplification and rearrangements drive tumor growth and evolution in multiple cancer types. We developed a tool, AmpliconArchitect (AA), which can robustly reconstruct the fine structure of focally amplified regions using whole genome sequencing. AA-reconstructed amplicons in pan-cancer data and in virus-driven cervical cancer samples revealed many novel insights about focal amplifications. Specifically, the findings lend support to extrachromosomally mediated mechanisms for copy number expansion, and oncoviral pathogenesis.

Download Full-text

Class Imbalance in Out-of-Distribution Datasets: Improving the Robustness of the TextCNN for the Classification of Rare Cancer Types

Journal of Biomedical Informatics ◽

10.1016/j.jbi.2021.103957 ◽

2021 ◽

pp. 103957

Author(s):

Kevin De Angeli ◽

Shang Gao ◽

Ioana Danciu ◽

Eric B. Durbin ◽

Xiao-Cheng Wu ◽

...

Keyword(s):

Class Imbalance ◽

Rare Cancer ◽

Cancer Types

Download Full-text

MEXCOWalk: Mutual Exclusion and Coverage Based Random Walk to Identify Cancer Modules

10.1101/547653 ◽

2019 ◽

Author(s):

Rafsan Ahmed ◽

Ilyes Baali ◽

Cesim Erten ◽

Evis Hoxha ◽

Hilal Kazan

Keyword(s):

Random Walk ◽

Mutual Exclusion ◽

Risk Scores ◽

Cancer Genes ◽

Multiple Cancer ◽

Driver Genes ◽

Cancer Driver ◽

Cancer Data ◽

Cancer Types ◽

Pan Cancer

AbstractMotivationGenomic analyses from large cancer cohorts have revealed the mutational heterogeneity problem which hinders the identification of driver genes based only on mutation profiles. One way to tackle this problem is to incorporate the fact that genes act together in functional modules. The connectivity knowledge present in existing protein-protein interaction networks together with mutation frequencies of genes and the mutual exclusivity of cancer mutations can be utilized to increase the accuracy of identifying cancer driver modules.ResultsWe present a novel edge-weighted random walk-based approach that incorporates connectivity information in the form of protein-protein interactions, mutual exclusion, and coverage to identify cancer driver modules. MEXCOWalk outperforms several state-of-the-art computational methods on TCGA pan-cancer data in terms of recovering known cancer genes, providing modules that are capable of classifying normal and tumor samples, and that are enriched for mutations in specific cancer types. Furthermore, the risk scores determined with output modules can stratify patients into low-risk and high-risk groups in multiple cancer types. MEXCOwalk identifies modules containing both well-known cancer genes and putative cancer genes that are rarely mutated in the pan-cancer data. The data, the source code, and useful scripts are available at:https://github.com/abu-compbio/[email protected]

Download Full-text

Survey and comparative assessments of computational multi-omics integrative methods with multiple regulatory networks identifying distinct tumor compositions across pan-cancer data sets

Briefings in Bioinformatics ◽

10.1093/bib/bbaa102 ◽

2020 ◽

Cited By ~ 1

Author(s):

Zhuohui Wei ◽

Yue Zhang ◽

Wanlin Weng ◽

Jiazhou Chen ◽

Hongmin Cai

Keyword(s):

Molecular Mechanisms ◽

Genomic Data ◽

Low Rank ◽

Integrated Analysis ◽

Data Sets ◽

Omics Data ◽

Data Types ◽

Cancer Data ◽

Cancer Types ◽

Pan Cancer

Abstract The significance of pan-cancer categories has recently been recognized as widespread in cancer research. Pan-cancer categorizes a cancer based on its molecular pathology rather than an organ. The molecular similarities among multi-omics data found in different cancer types can play several roles in both biological processes and therapeutic developments. Therefore, an integrated analysis for various genomic data is frequently used to reveal novel genetic and molecular mechanisms. However, a variety of algorithms for multi-omics clustering have been proposed in different fields. The comparison of different computational clustering methods in pan-cancer analysis performance remains unclear. To increase the utilization of current integrative methods in pan-cancer analysis, we first provide an overview of five popular computational integrative tools: similarity network fusion, integrative clustering of multiple genomic data types (iCluster), cancer integration via multi-kernel learning (CIMLR), perturbation clustering for data integration and disease subtyping (PINS) and low-rank clustering (LRACluster). Then, a priori interactions in multi-omics data were incorporated to detect prominent molecular patterns in pan-cancer data sets. Finally, we present comparative assessments of these methods, with discussion over key issues in applying these algorithms. We found that all five methods can identify distinct tumor compositions. The pan-cancer samples can be reclassified into several groups by different proportions. Interestingly, each method can classify the tumors into categories that are different from original cancer types or subtypes, especially for ovarian serous cystadenocarcinoma (OV) and breast invasive carcinoma (BRCA) tumors. In addition, all clusters of the five computational methods show notable prognostic values. Furthermore, both the 9 recurrent differential genes and the 15 common pathway characteristics were identified across all the methods. The results and discussion can help the community select appropriate integrative tools according to different research tasks or aims in pan-cancer analysis.

Download Full-text

Analysis of Gene Expression Cancer Data Set: Classification of TCGA Pan-cancer HiSeq Data

10.1109/bigdata52589.2021.9671793 ◽

2021 ◽

Author(s):

Yusaku Nitta ◽

Mitchell Borders ◽

Simone A. Ludwig

Keyword(s):

Gene Expression ◽

Data Set ◽

Cancer Data ◽

Pan Cancer

Download Full-text

A comparative study of different classification algorithms on RNA-Seq cancer data

New Trends and Issues Proceedings on Advances Pure and Applied Sciences ◽

10.18844/gjpaas.v0i12.4983 ◽

2020 ◽

pp. 24-35

Author(s):

Nihat Yilmaz Simsek ◽

Bulent Haznedar ◽

Cihan Kuzudisli

Keyword(s):

Clear Cell ◽

Cell Cancer ◽

Gene Mutations ◽

Rna Seq ◽

Accuracy Rate ◽

Cancer Data ◽

Diagnosis And Classification ◽

Cancer Types ◽

Artificial Neural Network Ann

Gene mutations are the most important reason of cancer diseases, and there are different kind of causing genes across these diseases. RNA-Seq technology enables us to allow for gathering information about many genes simultaneously; hence, RNA-Seq data can be used for cancer diagnosis and classification. In this study, RNA-Seq dataset for renal cell cancer is analysed using three different developed classification methods: random forest (RF), artificial neural network (ANN) and deep learning (DL). The genes in our dataset are related to the following cancer types: kidney renal papillary cell, kidney renal clear cell and kidney chromophore carcinomas. It suggests that the DL method gives the highest accuracy rate compared to RF and ANN for 95.15%, 91.83% and 89.22%, respectively. We believe that the results acquired in this study will make a contribution to the classification of cancer types and support doctors in their processes of decision making. Keywords: Classification, gene-expression, RNA-Seq, DL.

Download Full-text

MEXCOwalk: mutual exclusion and coverage based random walk to identify cancer modules

Bioinformatics ◽

10.1093/bioinformatics/btz655 ◽

2019 ◽

Cited By ~ 2

Author(s):

Rafsan Ahmed ◽

Ilyes Baali ◽

Cesim Erten ◽

Evis Hoxha ◽

Hilal Kazan

Keyword(s):

Random Walk ◽

Supplementary Information ◽

Risk Scores ◽

Cancer Genes ◽

Mutual Exclusivity ◽

Cancer Driver ◽

Cancer Data ◽

Ppi Networks ◽

Cancer Types ◽

Pan Cancer

Abstract Motivation Genomic analyses from large cancer cohorts have revealed the mutational heterogeneity problem which hinders the identification of driver genes based only on mutation profiles. One way to tackle this problem is to incorporate the fact that genes act together in functional modules. The connectivity knowledge present in existing protein–protein interaction (PPI) networks together with mutation frequencies of genes and the mutual exclusivity of cancer mutations can be utilized to increase the accuracy of identifying cancer driver modules. Results We present a novel edge-weighted random walk-based approach that incorporates connectivity information in the form of protein–protein interactions (PPIs), mutual exclusivity and coverage to identify cancer driver modules. MEXCOwalk outperforms several state-of-the-art computational methods on TCGA pan-cancer data in terms of recovering known cancer genes, providing modules that are capable of classifying normal and tumor samples and that are enriched for mutations in specific cancer types. Furthermore, the risk scores determined with output modules can stratify patients into low-risk and high-risk groups in multiple cancer types. MEXCOwalk identifies modules containing both well-known cancer genes and putative cancer genes that are rarely mutated in the pan-cancer data. The data, the source code and useful scripts are available at: https://github.com/abu-compbio/MEXCOwalk. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text