scholarly journals Sequence Neighborhoods Enable Reliable Prediction of Pathogenic Mutations in Cancer Genomes

Cancers ◽  
2021 ◽  
Vol 13 (10) ◽  
pp. 2366
Author(s):  
Shayantan Banerjee ◽  
Karthik Raman ◽  
Balaraman Ravindran

Identifying cancer-causing mutations from sequenced cancer genomes hold much promise for targeted therapy and precision medicine. “Driver” mutations are primarily responsible for cancer progression, while “passengers” are functionally neutral. Although several computational approaches have been developed for distinguishing between driver and passenger mutations, very few have concentrated on using the raw nucleotide sequences surrounding a particular mutation as potential features for building predictive models. Using experimentally validated cancer mutation data in this study, we explored various string-based feature representation techniques to incorporate information on the neighborhood bases immediately 5′ and 3′ from each mutated position. Density estimation methods showed significant distributional differences between the neighborhood bases surrounding driver and passenger mutations. Binary classification models derived using repeated cross-validation experiments provided comparable performances across all window sizes. Integrating sequence features derived from raw nucleotide sequences with other genomic, structural, and evolutionary features resulted in the development of a pan-cancer mutation effect prediction tool, NBDriver, which was highly efficient in identifying pathogenic variants from five independent validation datasets. An ensemble predictor obtained by combining the predictions from NBDriver with three other commonly used driver prediction tools (FATHMM (cancer), CONDEL, and MutationTaster) significantly outperformed existing pan-cancer models in prioritizing a literature-curated list of driver and passenger mutations. Using the list of true positive mutation predictions derived from NBDriver, we identified a list of 138 known driver genes with functional evidence from various sources. Overall, our study underscores the efficacy of using raw nucleotide sequences as features to distinguish between driver and passenger mutations from sequenced cancer genomes.

2021 ◽  
Author(s):  
Shayantan Banerjee ◽  
Karthik Raman ◽  
Balaraman Ravindran

AbstractIdentifying cancer-causing mutations from sequenced cancer genomes hold much promise for targeted therapy and precision medicine. “Driver” mutations are primarily responsible for cancer progression, while “passengers” are functionally neutral. Although several computational approaches have been developed for distinguishing between driver and passenger mutations, very few have concentrated on utilizing the raw nucleotide sequences surrounding a particular mutation as potential features for building predictive models. Using experimentally validated cancer mutation data in this study, we explored various string-based feature representation techniques to incorporate information on the neighborhood bases immediately 5’ and 3’ from each mutated position. Density estimation methods showed significant distributional differences between the neighborhood bases surrounding driver and passenger mutations. Binary classification models derived using repeated cross-validation experiments gave comparable performances across all window sizes. Integrating sequence features derived from raw nucleotide sequences with other genomic, structural and evolutionary features resulted in the development of a pan-cancer mutation effect prediction tool, NBDriver, which was highly efficient in identifying pathogenic variants from five independent validation datasets. An ensemble predictor obtained by combining the predictions from NBDriver with two other commonly used driver prediction tools (CONDEL and Mutation Taster) outperformed existing pan-cancer models in prioritizing a literature-curated list of driver and passenger mutations. Using the list of true positive mutation predictions derived from NBDriver, we identified a list of 138 known driver genes with functional evidence from various sources. Overall, our study underscores the efficacy of utilizing raw nucleotide sequences as features to distinguish between driver and passenger mutations from sequenced cancer genomes.


2018 ◽  
Author(s):  
Giorgio Mattiuz ◽  
Salvatore Di Giorgio ◽  
Lorenzo Tofani ◽  
Antonio Frandi ◽  
Francesco Donati ◽  
...  

AbstractAlterations in cancer genomes originate from mutational processes taking place throughout oncogenesis and cancer progression. We show that likeliness and entropy are two properties of somatic mutations crucial in cancer evolution, as cancer-driver mutations stand out, with respect to both of these properties, as being distinct from the bulk of passenger mutations. Our analysis can identify novel cancer driver genes and differentiate between gain and loss of function mutations.


2018 ◽  
Author(s):  
Sushant Kumar ◽  
Jonathan Warrell ◽  
Shantao Li ◽  
Patrick D. McGillivray ◽  
William Meyerson ◽  
...  

AbstractThe Pan-cancer Analysis of Whole Genomes (PCAWG) project provides an unprecedented opportunity to comprehensively characterize a vast set of uniformly annotated coding and non-coding mutations present in thousands of cancer genomes. Classical models of cancer progression posit that only a small number of these mutations strongly drive tumor progression and that the remaining ones (termed “putative passengers”) are inconsequential for tumorigenesis. In this study, we leveraged the comprehensive variant data from PCAWG to ascertain the molecular functional impact of each variant. The impact distribution of PCAWG mutations shows that, in addition to high- and low-impact mutations, there is a group of medium-impact putative passengers predicted to influence gene activity. Moreover, the predicted impact relates to the underlying mutational signature: different signatures confer divergent impact, differentially affecting distinct regulatory subsystems and gene categories. We also find that impact varies based on subclonal architecture (i.e., early vs. late mutations) and can be related to patient survival. Finally, we note that insufficient power due to limited cohort sizes precludes identification of weak drivers using standard recurrence-based approaches. To address this, we adapted an additive effects model derived from complex trait studies to show that aggregating the impact of putative passenger variants (i.e. including yet undetected weak drivers) provides significant predictability for cancer phenotypes beyond the PCAWG identified driver mutations (12.5% additive variance). Furthermore, this framework allowed us to estimate the frequency of potential weak driver mutations in the subset of PCAWG samples lacking well-characterized driver alterations.


2017 ◽  
Author(s):  
Shimin Shuai ◽  
Steven Gallinger ◽  
Lincoln Stein ◽  

AbstractWe describe DriverPower, a software package that uses mutational burden and functional impact evidence to identify cancer driver mutations in coding and non-coding sites within cancer whole genomes. Using a total of 1,373 genomic features derived from public sources, DriverPower’s background mutation model explains up to 93% of the regional variance in the mutation rate across a variety of tumour types. By incorporating functional impact scores, we are able to further increase the accuracy of driver discovery. Testing across a collection of 2,583 cancer genomes from the Pan-Cancer Analysis of Whole Genomes (PCAWG) project, DriverPower identifies 217 coding and 95 non-coding driver candidates. Comparing to six published methods used by the PCAWG Drivers and Functional Interpretation Group, DriverPower has the highest F1-score for both coding and non-coding driver discovery. This demonstrates that DriverPower is an effective framework for computational driver discovery.


2021 ◽  
Vol 23 (1) ◽  
Author(s):  
Ziyi Zhao ◽  
Chenxi Li ◽  
Fei Tong ◽  
Jingkuang Deng ◽  
Guofu Huang ◽  
...  

AbstractCharacterized by multiple complex mutations, including activation by oncogenes and inhibition by tumor suppressors, cancer is one of the leading causes of death. Application of CRISPR-Cas9 gene-editing technology in cancer research has aroused great interest, promoting the exploration of the molecular mechanism of cancer progression and development of precise therapy. CRISPR-Cas9 gene-editing technology provides a solid basis for identifying driver and passenger mutations in cancer genomes, which is of great value in genetic screening and for developing cancer models and treatments. This article reviews the current applications of CRISPR-Cas9 gene-editing technology in various cancer studies, the challenges faced, and the existing solutions, highlighting the potential of this technology for cancer treatment.


2014 ◽  
Author(s):  
Christopher Dennis McFarland ◽  
Leonid A Mirny ◽  
Kirill S Korolev

Cancer progression is an example of a rapid adaptive process where evolving new traits is essential for survival and requires a high mutation rate. Precancerous cells acquire a few key mutations that drive rapid population growth and carcinogenesis. Cancer genomics demonstrates that these few ‘driver’ mutations occur alongside thousands of random ‘passenger’ mutations––a natural consequence of cancer's elevated mutation rate. Some passengers can be deleterious to cancer cells, yet have been largely ignored in cancer research. In population genetics, however, the accumulation of mildly deleterious mutations has been shown to cause population meltdown. Here we develop a stochastic population model where beneficial drivers engage in a tug-of-war with frequent mildly deleterious passengers. These passengers present a barrier to cancer progression that is described by a critical population size, below which most lesions fail to progress, and a critical mutation rate, above which cancers meltdown. We find support for the model in cancer age-incidence and cancer genomics data that also allow us to estimate the fitness advantage of drivers and fitness costs of passengers. We identify two regimes of adaptive evolutionary dynamics and use these regimes to rationalize successes and failures of different treatment strategies. We find that a tumor’s load of deleterious passengers can explain previously paradoxical treatment outcomes and suggest that it could potentially serve as a biomarker of response to mutagenic therapies. The collective deleterious effect of passengers is currently an unexploited therapeutic target. We discuss how their effects might be exacerbated by both current and future therapies.


2016 ◽  
Vol 113 (42) ◽  
pp. E6409-E6417 ◽  
Author(s):  
David G. McFadden ◽  
Katerina Politi ◽  
Arjun Bhutkar ◽  
Frances K. Chen ◽  
Xiaoling Song ◽  
...  

Genetically engineered mouse models (GEMMs) of cancer are increasingly being used to assess putative driver mutations identified by large-scale sequencing of human cancer genomes. To accurately interpret experiments that introduce additional mutations, an understanding of the somatic genetic profile and evolution of GEMM tumors is necessary. Here, we performed whole-exome sequencing of tumors from three GEMMs of lung adenocarcinoma driven by mutant epidermal growth factor receptor (EGFR), mutant Kirsten rat sarcoma viral oncogene homolog (Kras), or overexpression of MYC proto-oncogene. Tumors from EGFR- and Kras-driven models exhibited, respectively, 0.02 and 0.07 nonsynonymous mutations per megabase, a dramatically lower average mutational frequency than observed in human lung adenocarcinomas. Tumors from models driven by strong cancer drivers (mutant EGFR and Kras) harbored few mutations in known cancer genes, whereas tumors driven by MYC, a weaker initiating oncogene in the murine lung, acquired recurrent clonal oncogenic Kras mutations. In addition, although EGFR- and Kras-driven models both exhibited recurrent whole-chromosome DNA copy number alterations, the specific chromosomes altered by gain or loss were different in each model. These data demonstrate that GEMM tumors exhibit relatively simple somatic genotypes compared with human cancers of a similar type, making these autochthonous model systems useful for additive engineering approaches to assess the potential of novel mutations on tumorigenesis, cancer progression, and drug sensitivity.


2015 ◽  
Author(s):  
Giulio Caravagna ◽  
Alex Graudenzi ◽  
DANIELE RAMAZZOTTI ◽  
Rebeca Sanz-Pamplona ◽  
Luca De Sano ◽  
...  

The genomic evolution inherent to cancer relates directly to a renewed focus on the voluminous next generation sequencing (NGS) data, and machine learning for the inference of explanatory models of how the (epi)genomic events are choreographed in cancer initiation and development. However, despite the increasing availability of multiple additional -omics data, this quest has been frustrated by various theoretical and technical hurdles, mostly stemming from the dramatic heterogeneity of the disease. In this paper, we build on our recent works on "selective advantage" relation among driver mutations in cancer progression and investigate its applicability to the modeling problem at the population level. Here, we introduce PiCnIc (Pipeline for Cancer Inference), a versatile, modular and customizable pipeline to extract ensemble-level progression models from cross-sectional sequenced cancer genomes. The pipeline has many translational implications as it combines state-of-the-art techniques for sample stratification, driver selection, identification of fitness-equivalent exclusive alterations and progression model inference. We demonstrate PiCnIc's ability to reproduce much of the current knowledge on colorectal cancer progression, as well as to suggest novel experimentally verifiable hypotheses.


2020 ◽  
Author(s):  
Zhilan Zhang ◽  
Lin Li ◽  
Mengyuan Li ◽  
Xiaosheng Wang

Abstract Background: The severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has infected more than 13 million people and has caused more than 570,000 deaths worldwide as of July 13, 2020. The SARS-CoV-2 human cell receptor ACE2 has recently received extensive attention for its role in SARS-CoV-2 infection. Many studies have also explored the association between ACE2 and cancer. However, a systemic investigation into associations between ACE2 and oncogenic pathways, tumor progression, and clinical outcomes in pan-cancer remains lacking. Methods: Using cancer genomics datasets from the Cancer Genome Atlas (TCGA) program, we performed computational analyses of associations between ACE2 expression and antitumor immunity, immunotherapy response, oncogenic pathways, tumor progression phenotypes, and clinical outcomes in 12 cancer cohorts. We also identified co-expression networks of ACE2 in cancer.Results: ACE2 upregulation was associated with increased antitumor immune signatures and PD-L1 expression, and favorable anti-PD-1/PD-L1/CTLA-4 immunotherapy response. ACE2 expression levels inversely correlated with the activity of cell cycle, mismatch repair, TGF-β, Wnt, VEGF, and Notch signaling pathways. Moreover, ACE2 expression levels had significant inverse correlations with tumor proliferation, stemness, and epithelial-mesenchymal transition (EMT). ACE2 upregulation was associated with favorable survival in pan-cancer and in multiple individual cancer types. Conclusions: ACE2 upregulation was associated with increased antitumor immunity and immunotherapy response, reduced tumor malignancy, and favorable survival in cancer, suggesting that ACE2 is a protective factor for cancer progression. Our data may provide potential clinical implications for treating cancer patients infected with SARS-CoV-2.


Sign in / Sign up

Export Citation Format

Share Document