scholarly journals Cancer progression models and fitness landscapes: a many-to-many relationship

2017 ◽  
Author(s):  
Ramon Diaz-Uriarte

AbstractThe identification of constraints, due to gene interactions, in the order of accumulation of mutations during cancer progression can allow us to single out therapeutic targets. Cancer progression models (CPMs) use genotype frequency data from cross-sectional samples to try to identify these constraints, and return Directed Acyclic Graphs (DAGs) of genes. On the other hand, fitness landscapes, which map genotypes to fitness, contain all possible paths of tumor progression. Thus, we expect a correspondence between DAGs from CPMs and the fitness landscapes where evolution happened. But many fitness landscapes —e.g., those with reciprocal sign epistasis— cannot be represented by CPMs. Using simulated data under 500 fitness landscapes, I show that CPMs’ performance (prediction of genotypes that can exist) degrades with reciprocal sign epistasis. There is large variability in the DAGs inferred from each landscape, which is also affected by mutation rate, detection regime, and fitness landscape features, in ways that depend on CPM method. And the same DAG is often observed in very different landscapes, which differ in more than 50% of their accessible genotypes. Using a pancreatic data set, I show that this many-to-many relationship affects the analysis of empirical data. Fitness landscapes that are widely different from each other can, when evolutionary processes run repeatedly on them, both produce data similar to the empirically observed one, and lead to DAGs that are very different among themselves. Because reciprocal sign epistasis can be common in cancer, these results question the use and interpretation of CPMs.

2014 ◽  
Author(s):  
Daniele Ramazzotti ◽  
Giulio Caravagna ◽  
Loes Olde Loohuis ◽  
Alex Graudenzi ◽  
Ilya Korsunsky ◽  
...  

We devise a novel inference algorithm to effectively solve the cancer progression model reconstruction problem. Our empirical analysis of the accuracy and convergence rate of our algorithm, CAncer PRogression Inference (CAPRI), shows that it outperforms the state-of-the-art algorithms addressing similar problems. Motivation: Several cancer-related genomic data have become available (e.g., The Cancer Genome Atlas, TCGA) typically involving hundreds of patients. At present, most of these data are aggregated in a cross-sectional fashion providing all measurements at the time of diagnosis. Our goal is to infer cancer ?progression? models from such data. These models are represented as directed acyclic graphs (DAGs) of collections of ?selectivity? relations, where a mutation in a gene A ?selects? for a later mutation in a gene B. Gaining insight into the structure of such progressions has the potential to improve both the stratification of patients and personalized therapy choices. Results: The CAPRI algorithm relies on a scoring method based on a probabilistic theory developed by Suppes, coupled with bootstrap and maximum likelihood inference. The resulting algorithm is efficient, achieves high accuracy, and has good complexity, also, in terms of convergence properties. CAPRI performs especially well in the presence of noise in the data, and with limited sample sizes. Moreover CAPRI, in contrast to other approaches, robustly reconstructs different types of confluent trajectories despite irregularities in the data. We also report on an ongoing investigation using CAPRI to study atypical Chronic Myeloid Leukemia, in which we uncovered non trivial selectivity relations and exclusivity patterns among key genomic events.


2019 ◽  
Vol 36 (1) ◽  
pp. 241-249 ◽  
Author(s):  
Rudolf Schill ◽  
Stefan Solbrig ◽  
Tilo Wettig ◽  
Rainer Spang

Abstract Motivation Cancer progresses by accumulating genomic events, such as mutations and copy number alterations, whose chronological order is key to understanding the disease but difficult to observe. Instead, cancer progression models use co-occurrence patterns in cross-sectional data to infer epistatic interactions between events and thereby uncover their most likely order of occurrence. State-of-the-art progression models, however, are limited by mathematical tractability and only allow events to interact in directed acyclic graphs, to promote but not inhibit subsequent events, or to be mutually exclusive in distinct groups that cannot overlap. Results Here we propose Mutual Hazard Networks (MHN), a new Machine Learning algorithm to infer cyclic progression models from cross-sectional data. MHN model events by their spontaneous rate of fixation and by multiplicative effects they exert on the rates of successive events. MHN compared favourably to acyclic models in cross-validated model fit on four datasets tested. In application to the glioblastoma dataset from The Cancer Genome Atlas, MHN proposed a novel interaction in line with consecutive biopsies: IDH1 mutations are early events that promote subsequent fixation of TP53 mutations. Availability and implementation Implementation and data are available at https://github.com/RudiSchill/MHN. Supplementary information Supplementary data are available at Bioinformatics online.


2019 ◽  
Vol 35 (14) ◽  
pp. i389-i397 ◽  
Author(s):  
Sayed-Rzgar Hosseini ◽  
Ramon Diaz-Uriarte ◽  
Florian Markowetz ◽  
Niko Beerenwinkel

Abstract Motivation How predictable is the evolution of cancer? This fundamental question is of immense relevance for the diagnosis, prognosis and treatment of cancer. Evolutionary biologists have approached the question of predictability based on the underlying fitness landscape. However, empirical fitness landscapes of tumor cells are impossible to determine in vivo. Thus, in order to quantify the predictability of cancer evolution, alternative approaches are required that circumvent the need for fitness landscapes. Results We developed a computational method based on conjunctive Bayesian networks (CBNs) to quantify the predictability of cancer evolution directly from mutational data, without the need for measuring or estimating fitness. Using simulated data derived from >200 different fitness landscapes, we show that our CBN-based notion of evolutionary predictability strongly correlates with the classical notion of predictability based on fitness landscapes under the strong selection weak mutation assumption. The statistical framework enables robust and scalable quantification of evolutionary predictability. We applied our approach to driver mutation data from the TCGA and the MSK-IMPACT clinical cohorts to systematically compare the predictability of 15 different cancer types. We found that cancer evolution is remarkably predictable as only a small fraction of evolutionary trajectories are feasible during cancer progression. Availability and implementation https://github.com/cbg-ethz/predictability\_of\_cancer\_evolution Supplementary information Supplementary data are available at Bioinformatics online.


2018 ◽  
Author(s):  
Rudolf Schill ◽  
Stefan Solbrig ◽  
Tilo Wettig ◽  
Rainer Spang

AbstractMotivationCancer progresses by accumulating genomic events, such as mutations and copy number alterations, whose chronological order is key to understanding the disease but difficult to observe. Instead, cancer progression models use co-occurence patterns in cross-sectional data to infer epistatic interactions between events and thereby uncover their most likely order of occurence. State-of-the-art progression models, however, are limited by mathematical tractability and only allow events to interact in directed acyclic graphs, to promote but not inhibit subsequent events, or to be mutually exclusive in distinct groups that cannot overlap.ResultsHere we propose Mutual Hazard Networks (MHN), a new Machine Learning algorithm to infer cyclic progression models from cross-sectional data. MHN model events by their spontaneous rate of fixation and by multiplicative effects they exert on the rates of successive events. MHN compared favourably to acyclic models in cross-validated model fit on four datasets tested. In application to the glioblastoma dataset from The Cancer Genome Atlas, MHN proposed a novel interaction in line with consecutive biopsies: IDH1 mutations are early events that promote subsequent fixation of TP53 mutations.AvailabilityImplementation and data are available at https://github.com/RudiSchill/MHN.


Author(s):  
David Bartram

AbstractHappiness/well-being researchers who use quantitative analysis often do not give persuasive reasons why particular variables should be included as controls in their cross-sectional models. One commonly sees notions of a “standard set” of controls, or the “usual suspects”, etc. These notions are not coherent and can lead to results that are significantly biased with respect to a genuine causal relationship.This article presents some core principles for making more effective decisions of that sort.  The contribution is to introduce a framework (the “causal revolution”, e.g. Pearl and Mackenzie 2018) unfamiliar to many social scientists (though well established in epidemiology) and to show how it can be put into practice for empirical analysis of causal questions.  In simplified form, the core principles are: control for confounding variables, and do not control for intervening variables or colliders.  A more comprehensive approach uses directed acyclic graphs (DAGs) to discern models that meet a minimum/efficient criterion for identification of causal effects.The article demonstrates this mode of analysis via a stylized investigation of the effect of unemployment on happiness.  Most researchers would include other determinants of happiness as controls for this purpose.  One such determinant is income—but income is an intervening variable in the path from unemployment to happiness, and including it leads to substantial bias.  Other commonly-used variables are simply unnecessary, e.g. religiosity and sex.  From this perspective, identifying the effect of unemployment on happiness requires controlling only for age and education; a small (parsimonious) model is evidently preferable to a more complex one in this instance.


2018 ◽  
Author(s):  
Christelle Fraïsse ◽  
John J. Welch

AbstractFitness interactions between mutations can influence a population’s evolution in many different ways. While epistatic effects are difficult to measure precisely, important information about the overall distribution is captured by the mean and variance of log fitnesses for individuals carrying different numbers of mutations. We derive predictions for these quantities from simple fitness landscapes, based on models of optimizing selection on quantitative traits. We also explore extensions to the models, including modular pleiotropy, variable effects sizes, mutational bias, and maladaptation of the wild-type. We illustrate our approach by reanalysing a large data set of mutant effects in a yeast snoRNA. Though characterized by some strong epistatic interactions, these data give a good overall fit to the non-epistatic null model, suggesting that epistasis might have little effect on the evolutionary dynamics in this system. We also show how the amount of epistasis depends on both the underlying fitness landscape, and the distribution of mutations, and so it is expected to vary in consistent ways between new mutations, standing variation, and fixed mutations.


2009 ◽  
Vol 07 (01) ◽  
pp. 135-156 ◽  
Author(s):  
VINHTHUY PHAN ◽  
E. OLUSEGUN GEORGE ◽  
QUYNH T. TRAN ◽  
SHIRLEAN GOODWIN ◽  
SRIDEVI BODREDDIGARI ◽  
...  

Post hoc assignment of patterns determined by all pairwise comparisons in microarray experiments with multiple treatments has been proven to be useful in assessing treatment effects. We propose the usage of transitive directed acyclic graphs (tDAG) as the representation of these patterns and show that such representation can be useful in clustering treatment effects, annotating existing clustering methods, and analyzing sample sizes. Advantages of this approach include: (1) unique and descriptive meaning of each cluster in terms of how genes respond to all pairs of treatments; (2) insensitivity of the observed patterns to the number of genes analyzed; and (3) a combinatorial perspective to address the sample size problem by observing the rate of contractible tDAG as the number of replicates increases. The advantages and overall utility of the method in elaborating drug structure activity relationships are exemplified in a controlled study with real and simulated data.


2014 ◽  
Author(s):  
Ramon Diaz-Uriarte

Cancer progression is caused by the sequential accumulation of mutations, but not all orders of accumulation of mutations are equally likely. When the fixation of some mutations depends on the presence of previous ones, identifying restrictions in the order of accumulation of mutations can lead to the discovery of therapeutic targets and diagnostic markers. Using simulated data sets, I conducted a comprehensive comparison of the performance of all available methods to identify these restrictions from cross-sectional data. In contrast to previous work, I embedded restrictions within evolutionary models of tumor progression that included passengers (mutations not responsible for the development of cancer, known to be very common). This allowed me to asses the effects of having to filter out passengers, of sampling schemes, and of deviations from order restrictions. Poor choices of method, filtering, and sampling lead to large errors in all performance metrics. Having to filter passengers lead to decreased performance, especially because true restrictions were missed. Overall, the best method for identifying order restrictions were Oncogenetic Trees, a fast and easy to use method that, although unable to recover dependencies of mutations on more than one mutation, showed good performance in most scenarios, superior to Conjunctive Bayesian Networks and Progression Networks. Single cell sampling provided no advantage, but sampling in the final stages of the disease vs.\ sampling at different stages had severe effects. Evolutionary model and deviations from order restrictions had major, and sometimes counterintuitive, interactions with other factors that affected performance. This paper provides practical recommendations for using these methods with experimental data. Moreover, it shows that it is both possible and necessary to embed assumptions about order restrictions and the nature of driver status within evolutionary models of cancer progression to evaluate the performance of inferential approaches.


2021 ◽  
Vol 21 (1) ◽  
Author(s):  
Andrew M. Ritchie ◽  
Tristan L. Stark ◽  
David A. Liberles

Abstract Background Recovering the historical patterns of selection acting on a protein coding sequence is a major goal of evolutionary biology. Mutation-selection models address this problem by explicitly modelling fixation rates as a function of site-specific amino acid fitness values.However, they are restricted in their utility for investigating directional evolution because they require prior knowledge of the locations of fitness changes in the lineages of a phylogeny. Results We apply a modified mutation-selection methodology that relaxes assumptions of equlibrium and time-reversibility. Our implementation allows us to identify branches where adaptive or compensatory shifts in the fitness landscape have taken place, signalled by a change in amino acid fitness profiles. Through simulation and analysis of an empirical data set of $$\beta $$ β -lactamase genes, we test our ability to recover the position of adaptive events within the tree and successfully reconstruct initial codon frequencies and fitness profile parameters generated under the non-stationary model. Conclusion We demonstrate successful detection of selective shifts and identification of the affected branch on partitions of 300 codons or more. We successfully reconstruct fitness parameters and initial codon frequencies in simulated data and demonstrate that failing to account for non-equilibrium evolution can increase the error in fitness profile estimation. We also demonstrate reconstruction of plausible shifts in amino acid fitnesses in the bacterial $$\beta $$ β -lactamase family and discuss some caveats for interpretation.


2020 ◽  
Vol 17 (167) ◽  
pp. 20190675
Author(s):  
Joshua Havumaki ◽  
Marisa C. Eisenberg

Accurately estimating the effect of an exposure on an outcome requires understanding how variables relevant to a study question are causally related to each other. Directed acyclic graphs (DAGs) are used in epidemiology to understand causal processes and determine appropriate statistical approaches to obtain unbiased measures of effect. Compartmental models (CMs) are also used to represent different causal mechanisms, by depicting flows between disease states on the population level. In this paper, we extend a mapping between DAGs and CMs to show how DAG-derived CMs can be used to compare competing causal mechanisms by simulating epidemiological studies and conducting statistical analyses on the simulated data. Through this framework, we can evaluate how robust simulated epidemiological study results are to different biases in study design and underlying causal mechanisms. As a case study, we simulated a longitudinal cohort study to examine the obesity paradox: the apparent protective effect of obesity on mortality among diabetic ever-smokers, but not among diabetic never-smokers. Our simulations illustrate how study design bias (e.g. reverse causation), can lead to the obesity paradox. Ultimately, we show the utility of transforming DAGs into in silico laboratories within which researchers can systematically evaluate bias, and inform analyses and study design.


Sign in / Sign up

Export Citation Format

Share Document