Estimating the predictability of cancer evolution

Abstract Motivation How predictable is the evolution of cancer? This fundamental question is of immense relevance for the diagnosis, prognosis and treatment of cancer. Evolutionary biologists have approached the question of predictability based on the underlying fitness landscape. However, empirical fitness landscapes of tumor cells are impossible to determine in vivo. Thus, in order to quantify the predictability of cancer evolution, alternative approaches are required that circumvent the need for fitness landscapes. Results We developed a computational method based on conjunctive Bayesian networks (CBNs) to quantify the predictability of cancer evolution directly from mutational data, without the need for measuring or estimating fitness. Using simulated data derived from >200 different fitness landscapes, we show that our CBN-based notion of evolutionary predictability strongly correlates with the classical notion of predictability based on fitness landscapes under the strong selection weak mutation assumption. The statistical framework enables robust and scalable quantification of evolutionary predictability. We applied our approach to driver mutation data from the TCGA and the MSK-IMPACT clinical cohorts to systematically compare the predictability of 15 different cancer types. We found that cancer evolution is remarkably predictable as only a small fraction of evolutionary trajectories are feasible during cancer progression. Availability and implementation https://github.com/cbg-ethz/predictability\_of\_cancer\_evolution Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Cancer progression models and fitness landscapes: a many-to-many relationship

10.1101/141465 ◽

2017 ◽

Author(s):

Ramon Diaz-Uriarte

Keyword(s):

Cancer Progression ◽

Fitness Landscape ◽

Simulated Data ◽

Directed Acyclic Graphs ◽

Fitness Landscapes ◽

Gene Interactions ◽

Cross Sectional ◽

Data Set ◽

Large Variability ◽

Acyclic Graphs

AbstractThe identification of constraints, due to gene interactions, in the order of accumulation of mutations during cancer progression can allow us to single out therapeutic targets. Cancer progression models (CPMs) use genotype frequency data from cross-sectional samples to try to identify these constraints, and return Directed Acyclic Graphs (DAGs) of genes. On the other hand, fitness landscapes, which map genotypes to fitness, contain all possible paths of tumor progression. Thus, we expect a correspondence between DAGs from CPMs and the fitness landscapes where evolution happened. But many fitness landscapes —e.g., those with reciprocal sign epistasis— cannot be represented by CPMs. Using simulated data under 500 fitness landscapes, I show that CPMs’ performance (prediction of genotypes that can exist) degrades with reciprocal sign epistasis. There is large variability in the DAGs inferred from each landscape, which is also affected by mutation rate, detection regime, and fitness landscape features, in ways that depend on CPM method. And the same DAG is often observed in very different landscapes, which differ in more than 50% of their accessible genotypes. Using a pancreatic data set, I show that this many-to-many relationship affects the analysis of empirical data. Fitness landscapes that are widely different from each other can, when evolutionary processes run repeatedly on them, both produce data similar to the empirically observed one, and lead to DAGs that are very different among themselves. Because reciprocal sign epistasis can be common in cancer, these results question the use and interpretation of CPMs.

Download Full-text

Convergence of oncogenic cooperation at single-cell and single-gene levels drives leukemic transformation

Nature Communications ◽

10.1038/s41467-021-26582-4 ◽

2021 ◽

Vol 12 (1) ◽

Author(s):

Yuxuan Liu ◽

Zhimin Gu ◽

Hui Cao ◽

Pranita Kaphle ◽

Junhua Lyu ◽

...

Keyword(s):

Gene Expression ◽

Single Cell ◽

Cancer Progression ◽

Single Gene ◽

Myeloid Cell ◽

Cellular Level ◽

Integrative Approach ◽

Cancer Evolution ◽

Cell Assays

AbstractCancers develop from the accumulation of somatic mutations, yet it remains unclear how oncogenic lesions cooperate to drive cancer progression. Using a mouse model harboring NRasG12D and EZH2 mutations that recapitulates leukemic progression, we employ single-cell transcriptomic profiling to map cellular composition and gene expression alterations in healthy or diseased bone marrows during leukemogenesis. At cellular level, NRasG12D induces myeloid lineage-biased differentiation and EZH2-deficiency impairs myeloid cell maturation, whereas they cooperate to promote myeloid neoplasms with dysregulated transcriptional programs. At gene level, NRasG12D and EZH2-deficiency independently and synergistically deregulate gene expression. We integrate results from histopathology, leukemia repopulation, and leukemia-initiating cell assays to validate transcriptome-based cellular profiles. We use this resource to relate developmental hierarchies to leukemia phenotypes, evaluate oncogenic cooperation at single-cell and single-gene levels, and identify GEM as a regulator of leukemia-initiating cells. Our studies establish an integrative approach to deconvolute cancer evolution at single-cell resolution in vivo.

Download Full-text

Molecular Fitness Landscapes from High-Coverage Sequence Profiling

Annual Review of Biophysics ◽

10.1146/annurev-biophys-052118-115333 ◽

2019 ◽

Vol 48 (1) ◽

pp. 1-18 ◽

Cited By ~ 15

Author(s):

Celia Blanco ◽

Evan Janzen ◽

Abe Pressman ◽

Ranajay Saha ◽

Irene A. Chen

Keyword(s):

High Throughput ◽

Large Scale ◽

High Throughput Sequencing ◽

Fitness Landscape ◽

Complete Sequence ◽

Fitness Landscapes ◽

Future Research ◽

High Coverage

The function of fitness (or molecular activity) in the space of all possible sequences is known as the fitness landscape. Evolution is a random walk on the fitness landscape, with a bias toward climbing hills. Mapping the topography of real fitness landscapes is fundamental to understanding evolution, but previous efforts were hampered by the difficulty of obtaining large, quantitative data sets. The accessibility of high-throughput sequencing (HTS) has transformed this study, enabling large-scale enumeration of fitness for many mutants and even complete sequence spaces in some cases. We review the progress of high-throughput studies in mapping molecular fitness landscapes, both in vitro and in vivo, as well as opportunities for future research. Such studies are rapidly growing in number. HTS is expected to have a profound effect on the understanding of real molecular fitness landscapes.

Download Full-text

DriverGroup: A novel method for identifying driver gene groups

10.1101/2020.04.23.058719 ◽

2020 ◽

Author(s):

Vu VH Pham ◽

Lin Liu ◽

Cameron P Bracken ◽

Gregory J Goodall ◽

Jiuyong Li ◽

...

Keyword(s):

Cancer Progression ◽

Single Gene ◽

Gene Interaction ◽

Computational Method ◽

Supplementary Information ◽

Driver Gene ◽

Driver Genes ◽

Cancer Driver ◽

Critical Nodes ◽

Novel Method

AbstractMotivationIdentifying cancer driver genes is a key task in cancer informatics. Most exisiting methods are focused on individual cancer drivers which regulate biological processes leading to cancer. However, the effect of a single gene may not be sufficient to drive cancer progression. Here, we hypothesise that there are driver gene groups that work in concert to regulate cancer and we develop a novel computational method to detect those driver gene groups.ResultsWe develop a novel method named DriverGroup to detect driver gene groups by using gene expression and gene interaction data. The proposed method has three stages: (1) Constructing the gene network, (2) Discovering critical nodes of the constructed network, and (3) Identifying driver gene groups based on the discovered critical nodes. Before evaluating the performance of DriverGroup in detecting cancer driver groups, we firstly assess its performance in detecting the influence of gene groups, a key step of DriverGroup. The application of DriverGroup to DREAM4 data demonstrates that it is more effective than other methods in detecting the regulation of gene groups. We then apply DriverGroup to the BRCA dataset to identify coding and non-coding driver groups for breast cancer. The identified driver groups are promising as several group members are confirmed to be related to cancer in literature. We further use the predicted driver groups in survival analysis and the results show that the survival curves of patient subpopulations classified using the predicted driver groups are significantly differentiated, indicating the usefulness of DriverGroup.Availability and implementationDriverGroup is available at https://github.com/pvvhoang/[email protected] informationSupplementary data are available at Bioinformatics online.

Download Full-text

Application of topic models to a compendium of ChIP-Seq datasets uncovers recurrent transcriptional regulatory modules

Bioinformatics ◽

10.1093/bioinformatics/btz975 ◽

2020 ◽

Vol 36 (8) ◽

pp. 2352-2358

Author(s):

Guodong Yang ◽

Aiqun Ma ◽

Zhaohui S Qin ◽

Li Chen

Keyword(s):

Cell Lines ◽

Large Scale ◽

Topic Models ◽

R Package ◽

Potential Interaction ◽

Computational Method ◽

Supplementary Information ◽

Regulatory Module ◽

Transcriptional Regulatory

Abstract Motivation The availability of thousands of genome-wide coupling chromatin immunoprecipitation (ChIP)-Seq datasets across hundreds of transcription factors (TFs) and cell lines provides an unprecedented opportunity to jointly analyze large-scale TF-binding in vivo, making possible the discovery of the potential interaction and cooperation among different TFs. The interacted and cooperated TFs can potentially form a transcriptional regulatory module (TRM) (e.g. co-binding TFs), which helps decipher the combinatorial regulatory mechanisms. Results We develop a computational method tfLDA to apply state-of-the-art topic models to multiple ChIP-Seq datasets to decipher the combinatorial binding events of multiple TFs. tfLDA is able to learn high-order combinatorial binding patterns of TFs from multiple ChIP-Seq profiles, interpret and visualize the combinatorial patterns. We apply the tfLDA to two cell lines with a rich collection of TFs and identify combinatorial binding patterns that show well-known TRMs and related TF co-binding events. Availability and implementation A software R package tfLDA is freely available at https://github.com/lichen-lab/tfLDA. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

On the deformability of an empirical fitness landscape by microbial evolution

Proceedings of the National Academy of Sciences ◽

10.1073/pnas.1808485115 ◽

2018 ◽

Vol 115 (44) ◽

pp. 11286-11291 ◽

Cited By ~ 12

Author(s):

Djordje Bajić ◽

Jean C. C. Vila ◽

Zachary D. Blount ◽

Alvaro Sánchez

Keyword(s):

Environmental Effects ◽

Evolutionary Dynamics ◽

Fitness Landscape ◽

Adaptive Dynamics ◽

Microbial Evolution ◽

Fitness Landscapes ◽

Complex Genetics ◽

Evolutionary Innovation ◽

Genotype Space ◽

Evolutionary Trajectories

A fitness landscape is a map between the genotype and its reproductive success in a given environment. The topography of fitness landscapes largely governs adaptive dynamics, constraining evolutionary trajectories and the predictability of evolution. Theory suggests that this topography can be deformed by mutations that produce substantial changes to the environment. Despite its importance, the deformability of fitness landscapes has not been systematically studied beyond abstract models, and little is known about its reach and consequences in empirical systems. Here we have systematically characterized the deformability of the genome-wide metabolic fitness landscape of the bacterium Escherichia coli. Deformability is quantified by the noncommutativity of epistatic interactions, which we experimentally demonstrate in mutant strains on the path to an evolutionary innovation. Our analysis shows that the deformation of fitness landscapes by metabolic mutations rarely affects evolutionary trajectories in the short range. However, mutations with large environmental effects produce long-range landscape deformations in distant regions of the genotype space that affect the fitness of later descendants. Our results therefore suggest that, even in situations in which mutations have strong environmental effects, fitness landscapes may retain their power to forecast evolution over small mutational distances despite the potential attenuation of that power over longer evolutionary trajectories. Our methods and results provide an avenue for integrating adaptive and eco-evolutionary dynamics with complex genetics and genomics.

Download Full-text

Modelling metabolic evolution on phenotypic fitness landscapes: a case study on C4 photosynthesis

Biochemical Society Transactions ◽

10.1042/bst20150148 ◽

2015 ◽

Vol 43 (6) ◽

pp. 1172-1176 ◽

Cited By ~ 4

Author(s):

David Heckmann

Keyword(s):

Structural Changes ◽

Crop Yields ◽

C4 Photosynthesis ◽

Fitness Landscape ◽

Fitness Landscapes ◽

C3 Plants ◽

Metabolic Systems ◽

C4 Pathway ◽

Evolutionary Trajectories ◽

Phenotypic Fitness

How did the complex metabolic systems we observe today evolve through adaptive evolution? The fitness landscape is the theoretical framework to answer this question. Since experimental data on natural fitness landscapes is scarce, computational models are a valuable tool to predict landscape topologies and evolutionary trajectories. Careful assumptions about the genetic and phenotypic features of the system under study can simplify the design of such models significantly. The analysis of C4 photosynthesis evolution provides an example for accurate predictions based on the phenotypic fitness landscape of a complex metabolic trait. The C4 pathway evolved multiple times from the ancestral C3 pathway and models predict a smooth ‘Mount Fuji’ landscape accordingly. The modelled phenotypic landscape implies evolutionary trajectories that agree with data on modern intermediate species, indicating that evolution can be predicted based on the phenotypic fitness landscape. Future directions will have to include structural changes of metabolic fitness landscape structure with changing environments. This will not only answer important evolutionary questions about reversibility of metabolic traits, but also suggest strategies to increase crop yields by engineering the C4 pathway into C3 plants.

Download Full-text

DriverGroup: a novel method for identifying driver gene groups

Bioinformatics ◽

10.1093/bioinformatics/btaa797 ◽

2020 ◽

Vol 36 (Supplement_2) ◽

pp. i583-i591

Author(s):

Vu V H Pham ◽

Lin Liu ◽

Cameron P Bracken ◽

Gregory J Goodall ◽

Jiuyong Li ◽

...

Keyword(s):

Cancer Progression ◽

Single Gene ◽

Gene Interaction ◽

Computational Method ◽

Supplementary Information ◽

Driver Gene ◽

Driver Genes ◽

Cancer Driver ◽

Critical Nodes ◽

Novel Method

Abstract Motivation Identifying cancer driver genes is a key task in cancer informatics. Most existing methods are focused on individual cancer drivers which regulate biological processes leading to cancer. However, the effect of a single gene may not be sufficient to drive cancer progression. Here, we hypothesize that there are driver gene groups that work in concert to regulate cancer, and we develop a novel computational method to detect those driver gene groups. Results We develop a novel method named DriverGroup to detect driver gene groups by using gene expression and gene interaction data. The proposed method has three stages: (i) constructing the gene network, (ii) discovering critical nodes of the constructed network and (iii) identifying driver gene groups based on the discovered critical nodes. Before evaluating the performance of DriverGroup in detecting cancer driver groups, we firstly assess its performance in detecting the influence of gene groups, a key step of DriverGroup. The application of DriverGroup to DREAM4 data demonstrates that it is more effective than other methods in detecting the regulation of gene groups. We then apply DriverGroup to the BRCA dataset to identify driver groups for breast cancer. The identified driver groups are promising as several group members are confirmed to be related to cancer in literature. We further use the predicted driver groups in survival analysis and the results show that the survival curves of patient subpopulations classified using the predicted driver groups are significantly differentiated, indicating the usefulness of DriverGroup. Availability and implementation DriverGroup is available at https://github.com/pvvhoang/DriverGroup Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Detecting evolutionary patterns of cancers using consensus trees

Bioinformatics ◽

10.1093/bioinformatics/btaa801 ◽

2020 ◽

Vol 36 (Supplement_2) ◽

pp. i684-i691

Author(s):

Sarah Christensen ◽

Juho Kim ◽

Nicholas Chia ◽

Oluwasanmi Koyejo ◽

Mohammed El-Kebir

Keyword(s):

Evolutionary Process ◽

Simulated Data ◽

Therapy Response ◽

Supplementary Information ◽

Driver Mutations ◽

Sequencing Data ◽

Large Solution ◽

Evolutionary Patterns ◽

Cancer Subtypes ◽

Evolutionary Trajectories

Abstract Motivation While each cancer is the result of an isolated evolutionary process, there are repeated patterns in tumorigenesis defined by recurrent driver mutations and their temporal ordering. Such repeated evolutionary trajectories hold the potential to improve stratification of cancer patients into subtypes with distinct survival and therapy response profiles. However, current cancer phylogeny methods infer large solution spaces of plausible evolutionary histories from the same sequencing data, obfuscating repeated evolutionary patterns. Results To simultaneously resolve ambiguities in sequencing data and identify cancer subtypes, we propose to leverage common patterns of evolution found in patient cohorts. We first formulate the Multiple Choice Consensus Tree problem, which seeks to select a tumor tree for each patient and assign patients into clusters in such a way that maximizes consistency within each cluster of patient trees. We prove that this problem is NP-hard and develop a heuristic algorithm, Revealing Evolutionary Consensus Across Patients (RECAP), to solve this problem in practice. Finally, on simulated data, we show RECAP outperforms existing methods that do not account for patient subtypes. We then use RECAP to resolve ambiguities in patient trees and find repeated evolutionary trajectories in lung and breast cancer cohorts. Availability and implementation https://github.com/elkebir-group/RECAP. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Detecting repeated cancer evolution in human tumours from multi-region sequencing data

10.1101/156729 ◽

2017 ◽

Author(s):

Giulio Caravagna ◽

Ylenia Giarratano ◽

Daniele Ramazzotti ◽

Trevor A Graham ◽

Guido Sanguinetti ◽

...

Keyword(s):

Cancer Progression ◽

Phylogenetic Trees ◽

Evolutionary Process ◽

Cancer Evolution ◽

Sequencing Data ◽

Robust Identification ◽

Genomic Changes ◽

Evolutionary Trajectories ◽

Genomic Aberrations ◽

Repeated Evolution

AbstractCarcinogenesis is an evolutionary process driven by the accumulation of genomic aberrations. Recurrent sequences of genomic changes, both between and within patients, reflect repeated evolution that is valuable for anticipating cancer progression. Multi-region sequencing and phylogenetic analysis allow inference of the partial temporal order of genomic changes within a patient’s tumour. However, the inherent stochasticity of the evolutionary process makes phylogenetic trees from different patients appear very distinct, preventing the robust identification of recurrent evolutionary trajectories. Here we present a novel quantitative method based on a machine learning approach called Transfer Learning (TL) that allows overcoming the stochastic effects of cancer evolution and highlighting hidden recurrences in cancer patient cohorts. When applied to multi-region sequencing datasets from lung, breast and renal cancer (708 samples from 160 patients), our method detected repeated evolutionary trajectories that determine novel patient subgroups, which reproduce in large singlesample cohorts (n=2,641) and have prognostic value. Our method provides a novel patient classification measure that is grounded in the cancer evolution paradigm, and which reveals repeated evolution during tumorigenesis, with implications for our ability to anticipate malignant evolution.

Download Full-text