scholarly journals cancerAlign: Stratifying tumors by unsupervised alignment across cancer types

2020 ◽  
Author(s):  
Bowen Gao ◽  
Yunan Luo ◽  
Jianzhu Ma ◽  
Sheng Wang

ABSTRACTTumor stratification, which aims at clustering tumors into biologically meaningful subtypes, is the key step towards personalized treatment. Large-scale profiled cancer genomics data enables us to develop computational methods for tumor stratification. However, most of the existing approaches only considered tumors from an individual cancer type during clustering, leading to the overlook of common patterns across cancer types and the vulnerability to the noise within that cancer type. To address these challenges, we proposed cancerAlign to map tumors of the target cancer type into latent spaces of other source cancer types. These tumors were then clustered in each latent space rather than the original space in order to exploit shared patterns across cancer types. Due to the lack of aligned tumor samples across cancer types, cancerAlign used adversarial learning to learn the mapping at the population level. It then used consensus clustering to integrate cluster labels from different source cancer types. We evaluated cancerAlign on 7,134 tumors spanning 24 cancer types from TCGA and observed substantial improvement on tumor stratification and cancer gene prioritization. We further revealed the transferability across cancer types, which reflected the similarity among them based on the somatic mutation profile. cancerAlign is an unsupervised approach that provides deeper insights into the heterogeneous and rapidly accumulating somatic mutation profile and can be also applied to other genome-scale molecular information.Availabilityhttps://github.com/bowen-gao/cancerAlign

2015 ◽  
Author(s):  
Sunho Park ◽  
Seung-Jun Kim ◽  
Donghyeon Yu ◽  
Samuel Pena-Llopis ◽  
Jianjiong Gao ◽  
...  

Identification of altered pathways that are clinically relevant across human cancers is a key challenge in cancer genomics. We developed a network-based algorithm to integrate somatic mutation data with gene networks and pathways, in order to identify pathways altered by somatic mutations across cancers. We applied our approach to The Cancer Genome Atlas (TCGA) dataset of somatic mutations in 4,790 cancer patients with 19 different types of malignancies. Our analysis identified cancer-type-specific altered pathways enriched with known cancer-relevant genes and drug targets. Consensus clustering using gene expression datasets that included 4,870 patients from TCGA and multiple independent cohorts confirmed that the altered pathways could be used to stratify patients into subgroups with significantly different clinical outcomes. Of particular significance, certain patient subpopulations with poor prognosis were identified because they had specific altered pathways for which there are available targeted therapies. These findings could be used to tailor and intensify therapy in these patients, for whom current therapy is suboptimal.


2021 ◽  
Author(s):  
H. Robert Frost

AbstractThe genetic alterations that underlie cancer development are highly tissue-specific with the majority of driving alterations occurring in only a few cancer types and with alterations common to multiple cancer types often showing a tissue-specific functional impact. This tissue-specificity means that the biology of normal tissues carries important information regarding the pathophysiology of the associated cancers, information that can be leveraged to improve the power and accuracy of cancer genomic analyses. Research exploring the use of normal tissue data for the analysis of cancer genomics has primarily focused on the paired analysis of tumor and adjacent normal samples. Efforts to leverage the general characteristics of normal tissue for cancer analysis has received less attention with most investigations focusing on understanding the tissue-specific factors that lead to individual genomic alterations or dysregulated pathways within a single cancer type. To address this gap and support scenarios where adjacent normal tissue samples are not available, we explored the genome-wide association between the transcriptomes of 21 solid human cancers and their associated normal tissues as profiled in healthy individuals. While the average gene expression profiles of normal and cancerous tissue may appear distinct, with normal tissues more similar to other normal tissues than to the associated cancer types, when transformed into relative expression values, i.e., the ratio of expression in one tissue or cancer relative to the mean in other tissues or cancers, the close association between gene activity in normal tissues and related cancers is revealed. As we demonstrate through an analysis of tumor data from The Cancer Genome Atlas and normal tissue data from the Human Protein Atlas, this association between tissue-specific and cancer-specific expression values can be leveraged to improve the prognostic modeling of cancer, the comparative analysis of different cancer types, and the analysis of cancer and normal tissue pairs.


2017 ◽  
Author(s):  
Jack Kuipers ◽  
Thomas Thurnherr ◽  
Giusi Moffa ◽  
Polina Suter ◽  
Jonas Behr ◽  
...  

Large-scale genomic data can help to uncover the complexity and diversity of the molecular changes that drive cancer progression. Statistical analysis of cancer data from different tissues of origin highlights differences and similarities which can guide drug repositioning as well as the design of targeted and precise treatments. Here, we developed an improved Bayesian network model for tumour mutational profiles and applied it to 8,198 patient samples across 22 cancer types from TCGA. For each cancer type, we identified the interactions between mutated genes, capturing signatures beyond mere mutational frequencies. When comparing mutation networks, we found genes which interact both within and across cancer types. To detach cancer classification from the tissue type we performed de novo clustering of the pancancer mutational profiles based on the Bayesian network models. We found 22 novel clusters which significantly improved survival prediction beyond clinical and histopathological information. The models highlight key gene interactions for each cluster that can be used for genomic stratification in clinical trials and for identifying drug targets within strata.


2020 ◽  
Author(s):  
Nadav Brandes ◽  
Nathan Linial ◽  
Michal Linial

AbstractThe characterization of germline genetic variation affecting cancer risk, known as cancer predisposition, is fundamental to preventive and personalized medicine. Current attempts to detect cancer predisposition genomic regions are typically based on small-scale familial studies or genome-wide association studies (GWAS) over dedicated case-control cohorts. In this study, we utilized the UK Biobank as a large-scale prospective cohort to conduct a comprehensive analysis of cancer predisposition using both GWAS and proteome-wide association study (PWAS), a method that highlights genetic associations mediated by functional alterations to protein-coding genes. We discovered 137 unique genomic loci implicated with cancer risk in the white British population across nine cancer types and pan-cancer. While most of these genomic regions are supported by external evidence, our results highlight novel loci as well. We performed a comparative analysis of cancer predisposition between cancer types, finding that most of the implicated regions are cancer-type specific. We further analyzed the role of recessive genetic effects in cancer predisposition. We found that 30 of the 137 cancer regions were recovered only by a recessive model, highlighting the importance of recessive inheritance outside of familial studies. Finally, we show that many of the cancer associations exert substantial cancer risk in the studied cohort, suggesting their clinical relevance.


Cancers ◽  
2022 ◽  
Vol 14 (2) ◽  
pp. 352
Author(s):  
Anyou Wang ◽  
Rong Hai ◽  
Paul J. Rider ◽  
Qianchuan He

Detecting cancers at early stages can dramatically reduce mortality rates. Therefore, practical cancer screening at the population level is needed. To develop a comprehensive detection system to classify multiple cancer types. We integrated an artificial intelligence deep learning neural network and noncoding RNA biomarkers selected from massive data. Our system can accurately detect cancer vs. healthy objects with 96.3% of AUC of ROC (Area Under Curve of a Receiver Operating Characteristic curve), and it surprisingly reaches 78.77% of AUC when validated by real-world raw data from a completely independent data set. Even validating with raw exosome data from blood, our system can reach 72% of AUC. Moreover, our system significantly outperforms conventional machine learning models, such as random forest. Intriguingly, with no more than six biomarkers, our approach can easily discriminate any individual cancer type vs. normal with 99% to 100% AUC. Furthermore, a comprehensive marker panel can simultaneously multi-classify common cancers with a stable 82.15% accuracy rate for heterogeneous cancerous tissues and conditions.: This detection system provides a promising practical framework for automatic cancer screening at population level. Key points: (1) We developed a practical cancer screening system, which is simple, accurate, affordable, and easy to operate. (2) Our system binarily classify cancers vs. normal with >96% AUC. (3) In total, 26 individual cancer types can be easily detected by our system with 99 to 100% AUC. (4) The system can detect multiple cancer types simultaneously with >82% accuracy.


2019 ◽  
Author(s):  
Abdullah Kahraman ◽  
Tülay Karakulak ◽  
Damian Szklarczyk ◽  
Christian von Mering

AbstractUnder normal conditions, cells of almost all tissue types express the same predominant canonical transcript isoform at each gene locus. In cancer, however, splicing regulation is often disturbed, leading to cancer-specific switches in the most dominant transcripts (MDT). But what is the pathogenic impact of these switches and how are they driving oncogenesis? To address these questions, we have analyzed isoform-specific protein-protein interaction disruptions in 1209 cancer samples covering 27 different cancer types from the Pan-Cancer Analysis of Whole Genomes (PCAWG) project of the International Cancer Genomics Consortium (ICGC). Our study revealed large variations in the number of cancer-specific MDT (cMDT) between cancer types. While carcinomas of the head and neck, or brain, had none or only a few cMDT, cancers of the female reproduction organs showed the highest number of cMDT. Interestingly, in contrast to the mutational load, the number of cMDT was tissue-specific, i.e. cancers arising from the same primary tissue had a similar number of cMDT. Some cMDT were found in 100% of all samples in a cancer type, making them candidates for diagnostic biomarkers. cMDT showed a tendency to fall at densely populated network regions where they disrupted protein interactions in the proximity of pathogenic cancer genes. A gene ontology enrichment analysis showed that these disruptions occurred mostly in enzyme signaling, protein translation, and RNA splicing pathways. Interestingly, no significant correlation between the number of cMDT and the number of coding or non-coding mutations could be identified. However, some transcript expressions correlated with mutations in non-coding splice-site and promoter regions of their genes. This work demonstrates for the first time the large extent of cancer-specific alterations in alternative splicing for 27 different cancer types. It highlights distinct and common patterns of cMDT and suggests novel pathogenic transcripts and markers that induce large network disruptions in cancers.


2021 ◽  
Vol 17 (6) ◽  
pp. e1009085
Author(s):  
H. Robert Frost

The genetic alterations that underlie cancer development are highly tissue-specific with the majority of driving alterations occurring in only a few cancer types and with alterations common to multiple cancer types often showing a tissue-specific functional impact. This tissue-specificity means that the biology of normal tissues carries important information regarding the pathophysiology of the associated cancers, information that can be leveraged to improve the power and accuracy of cancer genomic analyses. Research exploring the use of normal tissue data for the analysis of cancer genomics has primarily focused on the paired analysis of tumor and adjacent normal samples. Efforts to leverage the general characteristics of normal tissue for cancer analysis has received less attention with most investigations focusing on understanding the tissue-specific factors that lead to individual genomic alterations or dysregulated pathways within a single cancer type. To address this gap and support scenarios where adjacent normal tissue samples are not available, we explored the genome-wide association between the transcriptomes of 21 solid human cancers and their associated normal tissues as profiled in healthy individuals. While the average gene expression profiles of normal and cancerous tissue may appear distinct, with normal tissues more similar to other normal tissues than to the associated cancer types, when transformed into relative expression values, i.e., the ratio of expression in one tissue or cancer relative to the mean in other tissues or cancers, the close association between gene activity in normal tissues and related cancers is revealed. As we demonstrate through an analysis of tumor data from The Cancer Genome Atlas and normal tissue data from the Human Protein Atlas, this association between tissue-specific and cancer-specific expression values can be leveraged to improve the prognostic modeling of cancer, the comparative analysis of different cancer types, and the analysis of cancer and normal tissue pairs.


2021 ◽  
Author(s):  
Ertugrul Dalgic

Switch-like behavior of tumorigenesis could be governed by antagonistic gene and protein pairs with mutual inhibition. Unlike extensive analysis of gene expression, search for protein level antagonistic pairs has been limited. Here, potential cancer type specific antagonist protein pairs with mutual inhibition were obtained from large scale datasets. Cancer samples or cancer types were compared to retrieve potential protein pairs with contrasting differential expression patterns. Analysis of two different protein expression datasets showed that a few proteins participate in most of the mutually antagonistic relationships. Some proteins with highly antagonistic profile were identified, which could not be attained from a differential expression or a correlation based analysis. The antagonistic protein pairs are sparsely connected by molecular interactions. Glioma, melanoma, and cervical cancer, are more frequently associated with antagonistic proteins than most of the other cancer types. Integrative analysis of mutually antagonist protein pairs contributes to our understanding of systems level changes of cancer.


Author(s):  
Xun Gu ◽  
Zhan Zou ◽  
Jingwen Yang

AbstractEvolutionary understanding of cancer genes may provide insights on the nature and evolution of complex life and the origin of multicellularity. In this study, we focus on the evolutionary ages of cancer-driving sites, and try to explore to what extent the amino acids of cancer-driving sites can be traced back to the most recent common ancestor (MRCA) of the gene. According to gene phylostraigraphy analysis, we use the definition of gene age (tg) by the most ancient phylogenetic position that can be traced back, in most cases based on the large-scale homology search of protein sequences. Our results are shown that the site-age profile of cancer-driving sites of TP53 is correlated with the number of cancer types the somatic mutations may affect. In general, those amino acid sites mutated in most cancer types are much ancient. These sites frequently mutated in cancerous cells are possibly responsible for carcinogenesis; some may be very important for basic growth of single-cell organisms, and others may contribute to complex cell regulation of multicellular organisms. The further cancer genomics analysis also indicates that ages of cancer-driving sites are ancient but may have a broad range in early stages of metazoans.


2018 ◽  
Author(s):  
Collin Tokheim ◽  
Rachel Karchin

SummaryLarge-scale cancer sequencing studies of patient cohorts have statistically implicated many genes driving cancer growth and progression, and their identification has yielded substantial translational impact. However, a remaining challenge is to increase the resolution of driver prediction from the gene level to the mutation level, because mutation-level predictions are more closely aligned with the goal of precision cancer medicine. Here we present CHASMplus, a computational method, that is uniquely capable of identifying driver missense mutations, including those specific to a cancer type, as evidenced by significantly superior performance on diverse benchmarks. Applied to 8,657 tumor samples across 32 cancer types in The Cancer Genome Atlas, CHASMplus identifies over 4,000 unique driver missense mutations in 240 genes, supporting a prominent role for rare driver mutations. We show which TCGA cancer types are likely to yield discovery of new driver missense mutations by additional sequencing, which has important implications for public policy.SignificanceMissense mutations are the most frequent mutation type in cancers and the most difficult to interpret. While many computational methods have been developed to predict whether genes are cancer drivers or whether missense mutations are generally deleterious or pathogenic, there has not previously been a method to score the oncogenic impact of a missense mutation specifically by cancer type, limiting adoption of computational missense mutation predictors in the clinic. Cancer patients are routinely sequenced with targeted panels of cancer driver genes, but such genes contain a mixture of driver and passenger missense mutations which differ by cancer type. A patient’s therapeutic response to drugs and optimal assignment to a clinical trial depends on both the specific mutation in the gene of interest and cancer type. We present a new machine learning method honed for each TCGA cancer type, and a resource for fast lookup of the cancer-specific driver propensity of every possible missense mutation in the human exome.


Sign in / Sign up

Export Citation Format

Share Document