scholarly journals CAPRI: Efficient Inference of Cancer Progression Models from Cross-sectional Data

2014 ◽  
Author(s):  
Daniele Ramazzotti ◽  
Giulio Caravagna ◽  
Loes Olde Loohuis ◽  
Alex Graudenzi ◽  
Ilya Korsunsky ◽  
...  

We devise a novel inference algorithm to effectively solve the cancer progression model reconstruction problem. Our empirical analysis of the accuracy and convergence rate of our algorithm, CAncer PRogression Inference (CAPRI), shows that it outperforms the state-of-the-art algorithms addressing similar problems. Motivation: Several cancer-related genomic data have become available (e.g., The Cancer Genome Atlas, TCGA) typically involving hundreds of patients. At present, most of these data are aggregated in a cross-sectional fashion providing all measurements at the time of diagnosis. Our goal is to infer cancer ?progression? models from such data. These models are represented as directed acyclic graphs (DAGs) of collections of ?selectivity? relations, where a mutation in a gene A ?selects? for a later mutation in a gene B. Gaining insight into the structure of such progressions has the potential to improve both the stratification of patients and personalized therapy choices. Results: The CAPRI algorithm relies on a scoring method based on a probabilistic theory developed by Suppes, coupled with bootstrap and maximum likelihood inference. The resulting algorithm is efficient, achieves high accuracy, and has good complexity, also, in terms of convergence properties. CAPRI performs especially well in the presence of noise in the data, and with limited sample sizes. Moreover CAPRI, in contrast to other approaches, robustly reconstructs different types of confluent trajectories despite irregularities in the data. We also report on an ongoing investigation using CAPRI to study atypical Chronic Myeloid Leukemia, in which we uncovered non trivial selectivity relations and exclusivity patterns among key genomic events.

2019 ◽  
Vol 36 (1) ◽  
pp. 241-249 ◽  
Author(s):  
Rudolf Schill ◽  
Stefan Solbrig ◽  
Tilo Wettig ◽  
Rainer Spang

Abstract Motivation Cancer progresses by accumulating genomic events, such as mutations and copy number alterations, whose chronological order is key to understanding the disease but difficult to observe. Instead, cancer progression models use co-occurrence patterns in cross-sectional data to infer epistatic interactions between events and thereby uncover their most likely order of occurrence. State-of-the-art progression models, however, are limited by mathematical tractability and only allow events to interact in directed acyclic graphs, to promote but not inhibit subsequent events, or to be mutually exclusive in distinct groups that cannot overlap. Results Here we propose Mutual Hazard Networks (MHN), a new Machine Learning algorithm to infer cyclic progression models from cross-sectional data. MHN model events by their spontaneous rate of fixation and by multiplicative effects they exert on the rates of successive events. MHN compared favourably to acyclic models in cross-validated model fit on four datasets tested. In application to the glioblastoma dataset from The Cancer Genome Atlas, MHN proposed a novel interaction in line with consecutive biopsies: IDH1 mutations are early events that promote subsequent fixation of TP53 mutations. Availability and implementation Implementation and data are available at https://github.com/RudiSchill/MHN. Supplementary information Supplementary data are available at Bioinformatics online.


2018 ◽  
Author(s):  
Rudolf Schill ◽  
Stefan Solbrig ◽  
Tilo Wettig ◽  
Rainer Spang

AbstractMotivationCancer progresses by accumulating genomic events, such as mutations and copy number alterations, whose chronological order is key to understanding the disease but difficult to observe. Instead, cancer progression models use co-occurence patterns in cross-sectional data to infer epistatic interactions between events and thereby uncover their most likely order of occurence. State-of-the-art progression models, however, are limited by mathematical tractability and only allow events to interact in directed acyclic graphs, to promote but not inhibit subsequent events, or to be mutually exclusive in distinct groups that cannot overlap.ResultsHere we propose Mutual Hazard Networks (MHN), a new Machine Learning algorithm to infer cyclic progression models from cross-sectional data. MHN model events by their spontaneous rate of fixation and by multiplicative effects they exert on the rates of successive events. MHN compared favourably to acyclic models in cross-validated model fit on four datasets tested. In application to the glioblastoma dataset from The Cancer Genome Atlas, MHN proposed a novel interaction in line with consecutive biopsies: IDH1 mutations are early events that promote subsequent fixation of TP53 mutations.AvailabilityImplementation and data are available at https://github.com/RudiSchill/MHN.


PeerJ ◽  
2019 ◽  
Vol 7 ◽  
pp. e6301 ◽  
Author(s):  
Ping Wang ◽  
Zengli Zhang ◽  
Yujie Ma ◽  
Jun Lu ◽  
Hu Zhao ◽  
...  

Early detection and prediction of prognosis and treatment responses are all the keys in improving survival of ovarian cancer patients. This study profiled an ovarian cancer progression model to identify prognostic biomarkers for ovarian cancer patients. Mouse ovarian surface epithelial cells (MOSECs) can undergo spontaneous malignant transformation in vitro cell culture. These were used as a model of ovarian cancer progression for alterations in gene expression and signaling detected using the Illumina HiSeq2000 Next-Generation Sequencing platform and bioinformatical analyses. The differential expression of four selected genes was identified using the gene expression profiling interaction analysis (http://gepia.cancer-pku.cn/) and then associated with survival in ovarian cancer patients using the Cancer Genome Atlas dataset and the online Kaplan–Meier Plotter (http://www.kmplot.com) data. The data showed 263 aberrantly expressed genes, including 182 up-regulated and 81 down-regulated genes between the early and late stages of tumor progression in MOSECs. The bioinformatic data revealed four genes (i.e., guanosine 5′-monophosphate synthase (GMPS), progesterone receptor (PR), CD40, and p21 (cyclin-dependent kinase inhibitor 1A)) to play an important role in ovarian cancer progression. Furthermore, the Cancer Genome Atlas dataset validated the differential expression of these four genes, which were associated with prognosis in ovarian cancer patients. In conclusion, this study profiled differentially expressed genes using the ovarian cancer progression model and identified four (i.e., GMPS, PR, CD40, and p21) as prognostic markers for ovarian cancer patients. Future studies of prospective patients could further verify the clinical usefulness of this four-gene signature.


2017 ◽  
Author(s):  
Ramon Diaz-Uriarte

AbstractThe identification of constraints, due to gene interactions, in the order of accumulation of mutations during cancer progression can allow us to single out therapeutic targets. Cancer progression models (CPMs) use genotype frequency data from cross-sectional samples to try to identify these constraints, and return Directed Acyclic Graphs (DAGs) of genes. On the other hand, fitness landscapes, which map genotypes to fitness, contain all possible paths of tumor progression. Thus, we expect a correspondence between DAGs from CPMs and the fitness landscapes where evolution happened. But many fitness landscapes —e.g., those with reciprocal sign epistasis— cannot be represented by CPMs. Using simulated data under 500 fitness landscapes, I show that CPMs’ performance (prediction of genotypes that can exist) degrades with reciprocal sign epistasis. There is large variability in the DAGs inferred from each landscape, which is also affected by mutation rate, detection regime, and fitness landscape features, in ways that depend on CPM method. And the same DAG is often observed in very different landscapes, which differ in more than 50% of their accessible genotypes. Using a pancreatic data set, I show that this many-to-many relationship affects the analysis of empirical data. Fitness landscapes that are widely different from each other can, when evolutionary processes run repeatedly on them, both produce data similar to the empirically observed one, and lead to DAGs that are very different among themselves. Because reciprocal sign epistasis can be common in cancer, these results question the use and interpretation of CPMs.


Oncogene ◽  
2021 ◽  
Author(s):  
Yong Wu ◽  
Qinhao Guo ◽  
Xingzhu Ju ◽  
Zhixiang Hu ◽  
Lingfang Xia ◽  
...  

AbstractNumerous studies suggest an important role for copy number alterations (CNAs) in cancer progression. However, CNAs of long intergenic noncoding RNAs (lincRNAs) in ovarian cancer (OC) and their potential functions have not been fully investigated. Here, based on analysis of The Cancer Genome Atlas (TCGA) database, we identified in this study an oncogenic lincRNA termed LINC00662 that exhibited a significant correlation between its CNA and its increased expression. LINC00662 overexpression is highly associated with malignant features in OC patients and is a prognostic indicator. LINC00662 significantly promotes OC cell proliferation and metastasis in vitro and in vivo. Mechanistically, LINC00662 is stabilized by heterogeneous nuclear ribonucleoprotein H1 (HNRNPH1). Moreover, LINC00662 exerts oncogenic effects by interacting with glucose-regulated protein 78 (GRP78) and preventing its ubiquitination in OC cells, leading to activation of the oncogenic p38 MAPK signaling pathway. Taken together, our results define an oncogenic role for LINC00662 in OC progression mediated via GRP78/p38 signaling, with potential implications regarding therapeutic targets for OC.


Author(s):  
David Bartram

AbstractHappiness/well-being researchers who use quantitative analysis often do not give persuasive reasons why particular variables should be included as controls in their cross-sectional models. One commonly sees notions of a “standard set” of controls, or the “usual suspects”, etc. These notions are not coherent and can lead to results that are significantly biased with respect to a genuine causal relationship.This article presents some core principles for making more effective decisions of that sort.  The contribution is to introduce a framework (the “causal revolution”, e.g. Pearl and Mackenzie 2018) unfamiliar to many social scientists (though well established in epidemiology) and to show how it can be put into practice for empirical analysis of causal questions.  In simplified form, the core principles are: control for confounding variables, and do not control for intervening variables or colliders.  A more comprehensive approach uses directed acyclic graphs (DAGs) to discern models that meet a minimum/efficient criterion for identification of causal effects.The article demonstrates this mode of analysis via a stylized investigation of the effect of unemployment on happiness.  Most researchers would include other determinants of happiness as controls for this purpose.  One such determinant is income—but income is an intervening variable in the path from unemployment to happiness, and including it leads to substantial bias.  Other commonly-used variables are simply unnecessary, e.g. religiosity and sex.  From this perspective, identifying the effect of unemployment on happiness requires controlling only for age and education; a small (parsimonious) model is evidently preferable to a more complex one in this instance.


2015 ◽  
Author(s):  
Giulio Caravagna ◽  
Alex Graudenzi ◽  
DANIELE RAMAZZOTTI ◽  
Rebeca Sanz-Pamplona ◽  
Luca De Sano ◽  
...  

The genomic evolution inherent to cancer relates directly to a renewed focus on the voluminous next generation sequencing (NGS) data, and machine learning for the inference of explanatory models of how the (epi)genomic events are choreographed in cancer initiation and development. However, despite the increasing availability of multiple additional -omics data, this quest has been frustrated by various theoretical and technical hurdles, mostly stemming from the dramatic heterogeneity of the disease. In this paper, we build on our recent works on "selective advantage" relation among driver mutations in cancer progression and investigate its applicability to the modeling problem at the population level. Here, we introduce PiCnIc (Pipeline for Cancer Inference), a versatile, modular and customizable pipeline to extract ensemble-level progression models from cross-sectional sequenced cancer genomes. The pipeline has many translational implications as it combines state-of-the-art techniques for sample stratification, driver selection, identification of fitness-equivalent exclusive alterations and progression model inference. We demonstrate PiCnIc's ability to reproduce much of the current knowledge on colorectal cancer progression, as well as to suggest novel experimentally verifiable hypotheses.


Cancers ◽  
2020 ◽  
Vol 12 (8) ◽  
pp. 2046 ◽  
Author(s):  
Valerio Izzi ◽  
Martin N. Davis ◽  
Alexandra Naba

The extracellular matrix (ECM) is a master regulator of all cellular functions and a major component of the tumor microenvironment. We previously defined the “matrisome” as the ensemble of genes encoding ECM proteins and proteins modulating ECM structure or function. While compositional and biomechanical changes in the ECM regulate cancer progression, no study has investigated the genomic alterations of matrisome genes in cancers and their consequences. Here, mining The Cancer Genome Atlas (TCGA) data, we found that copy number alterations and mutations are frequent in matrisome genes, even more so than in the rest of the genome. We also found that these alterations are predicted to significantly impact gene expression and protein function. Moreover, we identified matrisome genes whose mutational burden is an independent predictor of survival. We propose that studying genomic alterations of matrisome genes will further our understanding of the roles of this compartment in cancer progression and will lead to the development of innovative therapeutic strategies targeting the ECM.


2018 ◽  
Vol 28 (5) ◽  
pp. 1347-1364 ◽  
Author(s):  
KF Arnold ◽  
GTH Ellison ◽  
SC Gadd ◽  
J Textor ◽  
PWG Tennant ◽  
...  

‘Unexplained residuals’ models have been used within lifecourse epidemiology to model an exposure measured longitudinally at several time points in relation to a distal outcome. It has been claimed that these models have several advantages, including: the ability to estimate multiple total causal effects in a single model, and additional insight into the effect on the outcome of greater-than-expected increases in the exposure compared to traditional regression methods. We evaluate these properties and prove mathematically how adjustment for confounding variables must be made within this modelling framework. Importantly, we explicitly place unexplained residual models in a causal framework using directed acyclic graphs. This allows for theoretical justification of appropriate confounder adjustment and provides a framework for extending our results to more complex scenarios than those examined in this paper. We also discuss several interpretational issues relating to unexplained residual models within a causal framework. We argue that unexplained residual models offer no additional insights compared to traditional regression methods, and, in fact, are more challenging to implement; moreover, they artificially reduce estimated standard errors. Consequently, we conclude that unexplained residual models, if used, must be implemented with great care.


Sign in / Sign up

Export Citation Format

Share Document