Every which way? On predicting tumor evolution using cancer progression models

AbstractSuccessful prediction of the likely paths of tumor progression is valuable for diagnostic, prognostic, and treatment purposes. Cancer progression models (CPMs) use cross-sectional samples to identify restrictions in the order of accumulation of driver mutations and thus CPMs encode the paths of tumor progression. Here we analyze the performance of four CPMs to examine whether they can be used to predict the true distribution of paths of tumor progression and to estimate evolutionary unpredictability. Employing simulations we show that if fitness landscapes are single peaked (have a single fitness maximum) there is good agreement between true and predicted distributions of paths of tumor progression when sample sizes are large, but performance is poor with the currently common much smaller sample sizes. Under multi-peaked fitness landscapes (i.e., those with multiple fitness maxima), performance is poor and improves only slightly with sample size. In all cases, detection regime (when tumors are sampled) is a key determinant of performance. Estimates of evolutionary unpredictability from the best performing CPM, among the four examined, tend to overestimate the true un-predictability and the bias is affected by detection regime; CPMs could be useful for estimating upper bounds to the true evolutionary unpredictability. Analysis of twenty-two cancer data sets shows low evolutionary unpredictability for several of the data sets. But most of the predictions of paths of tumor progression are very unreliable, and unreliability increases with the number of features analyzed. Our results indicate that CPMs could be valuable tools for predicting cancer progression but that, currently, obtaining useful predictions of paths of tumor progression from CPMs is dubious, and emphasize the need for methodological work that can account for the probably multi-peaked fitness landscapes in cancer.Author SummaryKnowing the likely paths of tumor progression is instrumental for cancer precision medicine as it would allow us to identify genetic targets that block disease progression and to improve therapeutic decisions. Direct information about paths of tumor progression is scarce, but cancer progression models (CPMs), which use as input cross-sectional data on genetic alterations, can be used to predict these paths. CPMs, however, make assumptions about fitness landscapes (genotype-fitness maps) that might not be met in cancer. We examine if four CPMs can be used to predict successfully the distribution of tumor progression paths; we find that some CPMs work well when sample sizes are large and fitness landscapes have a single fitness maximum, but in fitness landscapes with multiple fitness maxima prediction is poor. However, the best performing CPM in our study could be used to estimate evolutionary unpredictability. When we apply the best performing CPM in our study to twenty-two cancer data sets we find that predictions are generally unreliable but that some cancer data sets show low unpredictability. Our results highlight that CPMs could be valuable tools for predicting disease progression, but emphasize the need for methodological work to account for multi-peaked fitness landscapes.

Download Full-text

Conditional prediction of consecutive tumor evolution using cancer progression models: What genotype comes next?

PLoS Computational Biology ◽

10.1371/journal.pcbi.1009055 ◽

2021 ◽

Vol 17 (12) ◽

pp. e1009055

Author(s):

Juan Diaz-Colunga ◽

Ramon Diaz-Uriarte

Keyword(s):

Tumor Progression ◽

Cancer Progression ◽

Fitness Landscape ◽

Tumor Evolution ◽

Data Sets ◽

Short Term ◽

Global Features ◽

Cross Sectional ◽

Cancer Data ◽

Landscape Characteristics

Accurate prediction of tumor progression is key for adaptive therapy and precision medicine. Cancer progression models (CPMs) can be used to infer dependencies in mutation accumulation from cross-sectional data and provide predictions of tumor progression paths. However, their performance when predicting complete evolutionary trajectories is limited by violations of assumptions and the size of available data sets. Instead of predicting full tumor progression paths, here we focus on short-term predictions, more relevant for diagnostic and therapeutic purposes. We examine whether five distinct CPMs can be used to answer the question “Given that a genotype with n mutations has been observed, what genotype with n + 1 mutations is next in the path of tumor progression?” or, shortly, “What genotype comes next?”. Using simulated data we find that under specific combinations of genotype and fitness landscape characteristics CPMs can provide predictions of short-term evolution that closely match the true probabilities, and that some genotype characteristics can be much more relevant than global features. Application of these methods to 25 cancer data sets shows that their use is hampered by a lack of information needed to make principled decisions about method choice. Fruitful use of these methods for short-term predictions requires adapting method’s use to local genotype characteristics and obtaining reliable indicators of performance; it will also be necessary to clarify the interpretation of the method’s results when key assumptions do not hold.

Download Full-text

Conditional prediction of consecutive tumor evolution using cancer progression models: What genotype comes next?

10.1101/2020.12.16.423099 ◽

2020 ◽

Author(s):

Juan Diaz-Colunga ◽

Ramon Diaz-Uriarte

Keyword(s):

Tumor Progression ◽

Cancer Progression ◽

Fitness Landscape ◽

Tumor Evolution ◽

Data Sets ◽

Short Term ◽

Global Features ◽

Cross Sectional ◽

Cancer Data ◽

Landscape Characteristics

AbstractAccurate prediction of tumor progression is key for adaptive therapy and precision medicine. Cancer progression models (CPMs) can be used to infer dependencies in mutation accumulation from cross-sectional data and provide predictions of tumor progression paths. But their performance when predicting the complete evolutionary paths is limited by violations of assumptions and the size of available data sets. Instead of predicting full tumor progression paths, we can focus on short-term predictions, more relevant for diagnostic and therapeutic purposes. Here we examine if five distinct CPMs can be used to answer the question “Given that a genotype with n mutations has been observed, what genotype with n + 1 mutations is next in the path of tumor progression” or, shortly, “What genotype comes next”. Using simulated data we find that under specific combinations of genotype and fitness landscape characteristics CPMs can provide predictions of short-term evolution that closely match the true probabilities, and that some genotype characteristics (fitness and probability of being a local fitness maximum) can be much more relevant than global features. Thus, CPMs can provide short-term predictions even when global, long-term predictions are not possible because fitness landscape- and evolutionary model-specific assumptions are violated. When good performance is possible, we observe significant variation in the quality of predictions of different methods. Genotype-specific and global fitness landscape characteristics are required to determine which method provides best results in each case. Application of these methods to 25 cancer data sets shows that their use is hampered by lack of the information needed to make principled decisions about method choice and what predictions to trust. Fruitful use of these methods for short-term predictions requires adapting method’s use to local genotype characteristics and obtaining reliable indicators of performance; it will also be necessary to clarify the interpretation of the method’s results when key assumptions do not hold.

Download Full-text

Inferring restrictions in the temporal order of mutations during tumor progression: effects of passengers, evolutionary models, and sampling

10.1101/005587 ◽

2014 ◽

Author(s):

Ramon Diaz-Uriarte

Keyword(s):

Tumor Progression ◽

Cancer Progression ◽

Performance Metrics ◽

Simulated Data ◽

Evolutionary Model ◽

Data Sets ◽

Evolutionary Models ◽

Cross Sectional ◽

Order Restrictions ◽

Sampling Schemes

Cancer progression is caused by the sequential accumulation of mutations, but not all orders of accumulation of mutations are equally likely. When the fixation of some mutations depends on the presence of previous ones, identifying restrictions in the order of accumulation of mutations can lead to the discovery of therapeutic targets and diagnostic markers. Using simulated data sets, I conducted a comprehensive comparison of the performance of all available methods to identify these restrictions from cross-sectional data. In contrast to previous work, I embedded restrictions within evolutionary models of tumor progression that included passengers (mutations not responsible for the development of cancer, known to be very common). This allowed me to asses the effects of having to filter out passengers, of sampling schemes, and of deviations from order restrictions. Poor choices of method, filtering, and sampling lead to large errors in all performance metrics. Having to filter passengers lead to decreased performance, especially because true restrictions were missed. Overall, the best method for identifying order restrictions were Oncogenetic Trees, a fast and easy to use method that, although unable to recover dependencies of mutations on more than one mutation, showed good performance in most scenarios, superior to Conjunctive Bayesian Networks and Progression Networks. Single cell sampling provided no advantage, but sampling in the final stages of the disease vs.\ sampling at different stages had severe effects. Evolutionary model and deviations from order restrictions had major, and sometimes counterintuitive, interactions with other factors that affected performance. This paper provides practical recommendations for using these methods with experimental data. Moreover, it shows that it is both possible and necessary to embed assumptions about order restrictions and the nature of driver status within evolutionary models of cancer progression to evaluate the performance of inferential approaches.

Download Full-text

Oncogenetic Network Estimation with Disjunctive Bayesian Networks: Learning from Unstratified Samples while Preserving Mutual Exclusivity Relations

10.1101/2020.04.13.040022 ◽

2020 ◽

Author(s):

Phillip B. Nicol ◽

Kevin R. Coombes ◽

Courtney Deaver ◽

Oksana A. Chkrebtii ◽

Subhadeep Paul ◽

...

Keyword(s):

Cancer Progression ◽

Synthetic Data ◽

Ground Truth ◽

Genetic Alterations ◽

Mutual Exclusivity ◽

Cross Sectional ◽

Cancer Subtypes ◽

Cancer Data ◽

Progression Model ◽

Network Estimation

ABSTRACTCancer is the process of accumulating genetic alterations that confer selective advantages to tumor cells. The order in which aberrations occur is not arbitrary, and inferring the order of events is a challenging problem due to the lack of longitudinal samples from tumors. Moreover, a network model of oncogenesis should capture biological facts such as distinct progression trajectories of cancer subtypes and patterns of mutual exclusivity of alterations in the same pathways. In this paper, we present the Disjunctive Bayesian Network (DBN), a novel cancer progression model. Unlike previous models of oncogenesis, DBN naturally captures mutually exclusive alterations. Besides, DBN is flexible enough to represent progression trajectories of cancer subtypes, therefore allowing one to learn the progression network from unstratified data, i.e., mixed samples from multiple subtypes. We provide a scalable genetic algorithm to learn the structure of DBN from cross-sectional cancer data. To test our model, we simulate synthetic data from known progression networks and show that our algorithm infers the ground truth network with high accuracy. Finally, we apply our model to copy number data for colon cancer and mutation data for bladder cancer and observe that the recovered progression network matches known biological facts.

Download Full-text

Impacts of Sample Size and Quality-Adjusted Imputed Prices on Own-Price Elasticities Estimated Using Cross-Sectional Data

Journal of Agricultural and Applied Economics ◽

10.1017/s1074070800021374 ◽

2003 ◽

Vol 35 (2) ◽

pp. 415-421

Author(s):

Matthew C. Stockton

Keyword(s):

Sample Size ◽

Data Sets ◽

Sample Sizes ◽

Food Category ◽

Cross Sectional ◽

Data Set ◽

Food Away From Home ◽

Price Elasticities ◽

Quality Adjustment

Cross-sectional data sets containing expenditure and quantity information are typically used to calculate quality-adjusted imputed prices. Do sample size and quality adjustment of price statistically alter estimates for own-price elasticities? This paper employs a data set pertaining to three food categories—pork, cheese, and food away from home—with four sample sizes for each food category. Twelve sample sizes were used for both adjusted and unadjusted prices to derive elasticities. No statistical differences were found between own-price elasticities among sample sizes. However, elasticities that were based on adjusted price imputations were significantly different from those that were based on unadjusted prices.

Download Full-text

Fibroblast Subsets in Intestinal Homeostasis, Carcinogenesis, Tumor Progression, and Metastasis

Cancers ◽

10.3390/cancers13020183 ◽

2021 ◽

Vol 13 (2) ◽

pp. 183

Author(s):

Hao Dang ◽

Tom J. Harryvan ◽

Lukas J. A. C. Hawinkels

Keyword(s):

Epithelial Cells ◽

Tumor Progression ◽

Cancer Progression ◽

Structural Integrity ◽

Clinical Care ◽

Genetic Alterations ◽

Malignant Neoplasms ◽

Intestinal Homeostasis ◽

Structural Framework ◽

Invasive Carcinomas

In intestinal homeostasis, continuous renewal of the epithelium is crucial to withstand the plethora of stimuli which can damage the structural integrity of the intestines. Fibroblasts contribute to this renewal by facilitating epithelial cell differentiation as well as providing the structural framework in which epithelial cells can regenerate. Upon dysregulation of intestinal homeostasis, (pre-) malignant neoplasms develop, a process which is accompanied by (epi) genetic alterations in epithelial cells as well as phenotypic changes in fibroblast populations. In the context of invasive carcinomas, these fibroblast populations are termed cancer-associated fibroblasts (CAFs). CAFs are the most abundant cell type in the tumor microenvironment of colorectal cancer (CRC) and consist of various functionally heterogeneous subsets which can promote or restrain cancer progression. Although most previous research has focused on the biology of epithelial cells, accumulating evidence shows that certain fibroblast subsets can also importantly contribute to tumor initiation and progression, thereby possibly providing avenues for improvement of clinical care for CRC patients. In this review, we summarized the current literature on the emerging role of fibroblasts in various stages of CRC development, ranging from adenoma initiation to the metastatic spread of cancer cells. In addition, we highlighted translational and therapeutic perspectives of fibroblasts in the different stages of intestinal tumor progression.

Download Full-text

Identifying cancer pathway dysregulations using differential causal effects

10.1101/2021.05.20.444965 ◽

2021 ◽

Author(s):

Kim Philipp Jablonski ◽

Martin Franz-Xaver Pirkl ◽

Domagoj Cevid ◽

Peter Buehlmann ◽

Niko Beerenwinkel

Keyword(s):

Breast Cancer ◽

Cancer Progression ◽

Synthetic Data ◽

Causal Effects ◽

Data Sets ◽

Cellular Behavior ◽

Cancer Data ◽

Statistical Framework ◽

New Genes ◽

Cancerous Cells

Signaling pathways control cellular behavior. Dysregulated pathways, for example due to mutations that cause genes and proteins to be expressed abnormally, can lead to diseases, such as cancer. We introduce a novel computational approach, called Differential Causal Effects (dce), which compares normal to cancerous cells using the statistical framework of causality. The method allows to detect individual edges in a signaling pathway that are dysregulated in cancer cells, while accounting for confounding. Hence, artificial signals from, for example, batch effects have less influence on the result and dce has a higher chance to detect the biological signals. We show that dce outperforms competing methods on synthetic data sets and on CRISPR knockout screens. In an exploratory analysis on breast cancer data from TCGA, we recover known and discover new genes involved in breast cancer progression.

Download Full-text

Design of the TRONCO BioConductor Package for TRanslational ONCOlogy

10.1101/027524 ◽

2015 ◽

Cited By ~ 3

Author(s):

Marco Antoniotti ◽

Giulio Caravagna ◽

Luca De Sano ◽

Alex Graudenzi ◽

Giancarlo Mauri ◽

...

Keyword(s):

Cancer Progression ◽

Graphical Model ◽

Selective Advantage ◽

R Package ◽

Genetic Alterations ◽

Cross Sectional ◽

Data Set ◽

Statistical Confidence ◽

Translational Oncology ◽

Set Up

Models of cancer progression provide insights on the order of accumulation of genetic alterations during cancer development. Algorithms to infer such models from the currently available mutational profiles collected from different cancer patiens (cross-sectional data) have been defined in the literature since late 90s. These algorithms differ in the way they extract a graphical model of the events modelling the progression, e.g., somatic mutations or copy-number alterations. TRONCO is an R package for TRanslational ONcology which provides a serie of functions to assist the user in the analysis of cross sectional genomic data and, in particular, it implements algorithms that aim to model cancer progression by means of the notion of selective advantage. These algorithms are proved to outperform the current state-of-the-art in the inference of cancer progression models. TRONCO also provides functionalities to load input cross-sectional data, set up the execution of the algorithms, assess the statistical confidence in the results and visualize the models. Availability. Freely available at http://www.bioconductor.org/ under GPL license; project hosted at http://bimib.disco.unimib.it/ and https://github.com/BIMIB-DISCo/TRONCO. Contact. [email protected]

Download Full-text

DARPP-32 promotes ERBB3-mediated resistance to molecular targeted therapy in EGFR-mutated lung adenocarcinoma

10.1101/2021.02.12.430856 ◽

2021 ◽

Author(s):

Sk. Kayum Alam ◽

Yongchang Zhang ◽

Li Wang ◽

Zhu Zhu ◽

Christina E. Hernandez ◽

...

Keyword(s):

Lung Adenocarcinoma ◽

Disease Progression ◽

Cancer Progression ◽

Lung Tumor ◽

Genetic Alterations ◽

In Vivo Studies ◽

Driver Mutations ◽

Egfr Tki ◽

Egfr Tkis

AbstractWhile molecular targeted therapies have improved prognoses of advanced stage lung adenocarcinoma expressing oncogenic driver mutations, acquired therapeutic resistance continues to be a major problem. Epidermal growth factor receptor (EGFR) activating mutations are among the most common targetable genetic alterations in lung adenocarcinoma, and EGFR tyrosine kinase inhibitors (TKIs) are recommended first-line therapy for EGFR mutation positive cancer patients. Unfortunately, most patients develop resistance to EGFR TKIs and rapid disease progression occurs. A better mechanistic understanding of therapy refractory cancer progression is necessary to develop new therapeutic approaches to predict and prevent acquired resistance to EGFR TKIs. Here, we identify a new mechanism of ERBB3-mediated resistance to EGFR TKIs in human lung adenocarcinoma. Specifically, we show that dopamine and cyclic AMP-regulated phosphoprotein, Mr 32000 (DARPP-32) physically recruits ERBB3 to EGFR to mediate a switch from EGFR homodimers to EGFR:ERBB3 heterodimers to bypass EGFR TKI-mediated inhibition to potentiate ERBB3-dependent activation of oncogenic AKT and ERK signaling that drives therapy refractory tumor cell survival. In a cohort of paired tumor specimens derived from 30 lung adenocarcinoma patients before and after the development of EGFR TKI refractory disease progression, we reveal that DARPP-32 as well as kinase-activated EGFR and ERBB3 proteins are overexpressed upon acquired EGFR TKI resistance. In vivo studies suggest that ablation of DARPP-32 protein activity sensitizes gefitinib-resistant lung tumor xenografts to EGFR TKI treatment, while DARPP-32 overexpression increases gefitinib-refractory lung cancer progression in gefitinib-sensitive lung tumors orthotopically xenografted into mice. Taken together, our findings introduce a DARPP-32-mediated, ERBB3-dependent mechanism used by lung tumor cells to evade EGFR TKI-induced cell death, potentially paving the way for the development of new therapies to prevent or overcome therapy-refractory lung adenocarcinoma progression.

Download Full-text

Cancer progression models and fitness landscapes: a many-to-many relationship

10.1101/141465 ◽

2017 ◽

Author(s):

Ramon Diaz-Uriarte

Keyword(s):

Cancer Progression ◽

Fitness Landscape ◽

Simulated Data ◽

Directed Acyclic Graphs ◽

Fitness Landscapes ◽

Gene Interactions ◽

Cross Sectional ◽

Data Set ◽

Large Variability ◽

Acyclic Graphs

AbstractThe identification of constraints, due to gene interactions, in the order of accumulation of mutations during cancer progression can allow us to single out therapeutic targets. Cancer progression models (CPMs) use genotype frequency data from cross-sectional samples to try to identify these constraints, and return Directed Acyclic Graphs (DAGs) of genes. On the other hand, fitness landscapes, which map genotypes to fitness, contain all possible paths of tumor progression. Thus, we expect a correspondence between DAGs from CPMs and the fitness landscapes where evolution happened. But many fitness landscapes —e.g., those with reciprocal sign epistasis— cannot be represented by CPMs. Using simulated data under 500 fitness landscapes, I show that CPMs’ performance (prediction of genotypes that can exist) degrades with reciprocal sign epistasis. There is large variability in the DAGs inferred from each landscape, which is also affected by mutation rate, detection regime, and fitness landscape features, in ways that depend on CPM method. And the same DAG is often observed in very different landscapes, which differ in more than 50% of their accessible genotypes. Using a pancreatic data set, I show that this many-to-many relationship affects the analysis of empirical data. Fitness landscapes that are widely different from each other can, when evolutionary processes run repeatedly on them, both produce data similar to the empirically observed one, and lead to DAGs that are very different among themselves. Because reciprocal sign epistasis can be common in cancer, these results question the use and interpretation of CPMs.

Download Full-text