scholarly journals In silico learning of tumor evolution through mutational time series

2019 ◽  
Vol 116 (19) ◽  
pp. 9501-9510 ◽  
Author(s):  
Noam Auslander ◽  
Yuri I. Wolf ◽  
Eugene V. Koonin

Cancer arises through the accumulation of somatic mutations over time. Understanding the sequence of mutation occurrence during cancer progression can assist early and accurate diagnosis and improve clinical decision-making. Here we employ long short-term memory (LSTM) networks, a class of recurrent neural network, to learn the evolution of a tumor through an ordered sequence of mutations. We demonstrate the capacity of LSTMs to learn complex dynamics of the mutational time series governing tumor progression, allowing accurate prediction of the mutational burden and the occurrence of mutations in the sequence. Using the probabilities learned by the LSTM, we simulate mutational data and show that the simulation results are statistically indistinguishable from the empirical data. We identify passenger mutations that are significantly associated with established cancer drivers in the sequence and demonstrate that the genes carrying these mutations are substantially enriched in interactions with the corresponding driver genes. Breaking the network into modules consisting of driver genes and their interactors, we show that these interactions are associated with poor patient prognosis, thus likely conferring growth advantage for tumor progression. Thus, application of LSTM provides for prediction of numerous additional conditional drivers and reveals hitherto unknown aspects of cancer evolution.

2019 ◽  
Author(s):  
Noam Auslander ◽  
Yuri I. Wolf ◽  
Eugene V. Koonin

AbstractCancer arises through the accumulation of somatic mutations over time. Understanding the sequence of mutation occurrence during cancer progression can assist early and accurate diagnosis and improve clinical decision-making. Here we employ Long Short-Term Memory networks (LSTMs), a class of recurrent neural network, to learn the evolution of a tumor through an ordered sequence of mutations. We demonstrate the capacity of LSTMs to learn complex dynamics of the mutational time series governing tumor progression, allowing accurate prediction of the mutational burden and the occurrence of mutations in the sequence. Using the probabilities learned by the LSTM, we simulate mutational data and show that the simulation results are statistically indistinguishable from the empirical data. We identify passenger mutations that are significantly associated with established cancer drivers in the sequence and demonstrate that the genes carrying these mutations are substantially enriched in interactions with the corresponding driver genes. Breaking the network into modules consisting of driver genes and their interactors, we show that these interactions are associated with poor patient prognosis, thus likely conferring growth advantage for tumor progression. Thus, application of LSTM provides for prediction of numerous additional conditional drivers and to reveal hitherto unknown aspects of cancer evolution.SignificanceCancer is caused by the effects of somatic mutations known as drivers. Although a number of major cancer drivers have been identified, it is suspected that many more comparatively rare and conditional drivers exist, and the interactions between different cancer-associated mutations that might be relevant for tumor progression are not well understood. We applied an advanced neural network approach to learn the sequence of mutations and the mutational burden in colon and lung cancers, and to identify mutations that are associated with individual drivers. A significant ordering of driver mutations is demonstrated, and numerous, previously undetected conditional drivers are identified. These findings broaden the existing understanding of the mechanisms of tumor progression and have implications for therapeutic strategies.


Cancers ◽  
2020 ◽  
Vol 12 (11) ◽  
pp. 3396
Author(s):  
Lorena Incorvaia ◽  
Daniele Fanale ◽  
Giuseppe Badalamenti ◽  
Chiara Brando ◽  
Marco Bono ◽  
...  

Introduction of checkpoint inhibitors resulted in durable responses and improvements in overall survival in advanced RCC patients, but the treatment efficacy is widely variable, and a considerable number of patients are resistant to PD-1/PD-L1 inhibition. This variability of clinical response makes necessary the discovery of predictive biomarkers for patient selection. Previous findings showed that the epigenetic modifications, including an extensive microRNA-mediated regulation of tumor suppressor genes, are key features of RCC. Based on this biological background, we hypothesized that a miRNA expression profile directly identified in the peripheral lymphocytes of the patients before and after the nivolumab administration could represent a step toward a real-time monitoring of the dynamic changes during cancer evolution and treatment. Interestingly, we found a specific subset of miRNAs, called “lymphocyte miRNA signature”, specifically induced in long-responder patients (CR, PR, or SD to nivolumab >18 months). Focusing on the clinical translational potential of miRNAs in controlling the expression of immune checkpoints, we identified the association between the plasma levels of soluble PD-1/PD-L1 and expression of some lymphocyte miRNAs. These findings could help the development of novel dynamic predictive biomarkers urgently needed to predict the potential response to immunotherapy and to guide clinical decision-making in RCC patients.


2019 ◽  
Vol 37 (15_suppl) ◽  
pp. e14684-e14684
Author(s):  
James R. Cunningham ◽  
Jon Rittenbach ◽  
Mitch Clemens ◽  
Cheryl Dodd ◽  
Ashley Wilson ◽  
...  

e14684 Background: Cancer progression through clonal evolution and emergent phenotypic heterogeneity is thought to reflect stochastic events such as genetic drift. This divergence over time in the character of a neoplasm might also reflect genetic selection, analogous to other populations in nature, to maximize niche resource utilization. We hypothesized that selection pressures operate in patients with cancer to drive cancer evolution, are clinically identifiable, their influence measurable. Methods: To develop a system for cancer ecology staging, a feasibility study recruited 15 patients with active cancer from any site, with expected survival of more than 6 months and providing informed consent. A set of clinical parameters obtained from a patient questionnaire, physical exam and laboratory testing was used to generate eight separate ecological profiles of tumor microenvironment, chronic inflammation, energy balance, psychosocial stress, GI microbiome, endocrine environment, skeletal remodeling and environmental mutagenesis. A scoring system, based on evidence of positive selection was designed to quantitate the individual profiles. Profile scores were then aggregated using a 2-D radar plot to generate a polygon, an ‘ecogram’, whose area, it is hypothesized, corresponds to the net level of selection pressure influencing tumor evolution. Results: Ecological profiles were obtained from each of 15 patients allowing determination of the ecogram area (EA) bounded by the polygon. EA determinations ranged widely among the 15 patient, from 0-12.7 arbitrary units (au, mean 5.01± 0.80). Ecograms from individual patients demonstrated unique shapes suggesting specificity for individual patient ecology. EA measurements were then used to inform an ecological staging system based on a simplified dichotomization, low/high, of ecosystem resources and threats. Of 15 patients, 6 were considered to have high resources (EA > 5au) available to support tumor evolution. High anti-tumor threat, measured by CD3 lymphocyte immunohistochemical scoring, was detected in 11 patients. Conclusions: An ecological assessment of patients with active cancer appears feasible. Inter-patient variation in ecogram area and morphology suggests there are potential important differences in genetic selection found between patients and should be correlated with survival outcomes in future studies, validation offering a target for ecosystem ‘restoration’.


2021 ◽  
Vol 4 ◽  
Author(s):  
Arjun Bhatt ◽  
Ruth Roberts ◽  
Xi Chen ◽  
Ting Li ◽  
Skylar Connor ◽  
...  

Drug labeling contains an ‘INDICATIONS AND USAGE’ that provides vital information to support clinical decision making and regulatory management. Effective extraction of drug indication information from free-text based resources could facilitate drug repositioning projects and help collect real-world evidence in support of secondary use of approved medicines. To enable AI-powered language models for the extraction of drug indication information, we used manual reading and curation to develop a Drug Indication Classification and Encyclopedia (DICE) based on FDA approved human prescription drug labeling. A DICE scheme with 7,231 sentences categorized into five classes (indications, contradictions, side effects, usage instructions, and clinical observations) was developed. To further elucidate the utility of the DICE, we developed nine different AI-based classifiers for the prediction of indications based on the developed DICE to comprehensively assess their performance. We found that the transformer-based language models yielded an average MCC of 0.887, outperforming the word embedding-based Bidirectional long short-term memory (BiLSTM) models (0.862) with a 2.82% improvement on the test set. The best classifiers were also used to extract drug indication information in DrugBank and achieved a high enrichment rate (>0.930) for this task. We found that domain-specific training could provide more explainable models without performance sacrifices and better generalization for external validation datasets. Altogether, the proposed DICE could be a standard resource for the development and evaluation of task-specific AI-powered, natural language processing (NLP) models.


2017 ◽  
Author(s):  
Vincent L. Cannataro ◽  
Stephen G. Gaffney ◽  
Jeffrey P. Townsend

ABSTRACTA major goal of cancer biology is determination of the relative importance of the genomic alterations that confer selective advantage to cancer cells. Tumor sequence surveys have frequently ranked the importance of substitutions to cancer growth by P value or a false-discovery conversion thereof. However, P values are thresholds for belief, not metrics of effect. Their frequent misuse as metrics of effect has often been vociferously decried. Here, we estimate the effect sizes of all recurrent single nucleotide variants in 23 cancer types, quantifying relative importance within and between driver genes. Some of the variants with the highest effect size, such as EGFR L858R in lung adenocarcinoma and BRAF V600E in primary skin cutaneous melanoma, have yielded remarkable therapeutic responses. Quantification of cancer effect sizes has immediate importance to the prioritization of clinical decision-making by tumor boards, selection and design of clinical trials, pharmacological targeting, and basic research prioritization.


2021 ◽  
Author(s):  
Daria Kurz ◽  
Carlos Salort S&aacutenchez ◽  
Cristian Axenie

For decades, researchers have used the concepts of rate of change and differential equations to model and forecast neoplastic processes. This expressive mathematical apparatus brought significant insights in oncology by describing the unregulated proliferation and host interactions of cancer cells, as well as their response to treatments. Now, these theories have been given a new life and found new applications. With the advent of routine cancer genome sequencing and the resulting abundance of data, oncology now builds an "arsenal" of new modeling and analysis tools. Models describing the governing physical laws of tumor-host-drug interactions can be now challenged with biological data to make predictions about cancer progression. Our study joins the efforts of the mathematical and computational oncology community by introducing a novel machine learning system for data-driven discovery of mathematical and physical relations in oncology. The system utilizes computational mechanisms such as competition, cooperation, and adaptation in neural networks to simultaneously learn the statistics and the governing relations between multiple clinical data covariates. Targeting an easy adoption in clinical oncology, the solutions of our system reveal human-understandable properties and features hidden in the data. As our experiments demonstrate, our system can describe nonlinear conservation laws in cancer kinetics and growth curves, symmetries in tumor's phenotypic staging transitions, the pre-operative spatial tumor distribution, and up to the nonlinear intracellular and extracellular pharmacokinetics of neoadjuvant therapies. The primary goal of our work is to enhance or improve the mechanistic understanding of cancer dynamics by exploiting heterogeneous clinical data. We demonstrate through multiple instantiations that our system is extracting an accurate human-understandable representation of the underlying dynamics of physical interactions central to typical oncology problems. Our results and evaluation demonstrate that using simple - yet powerful - computational mechanisms, such a machine learning system can support clinical decision making. To this end, our system is a representative tool of the field of mathematical and computational oncology and offers a bridge between the data, the modeler, the data scientist, and the practising clinician.


Author(s):  
Rawan AlSaad ◽  
Qutaibah Malluhi ◽  
Ibrahim Janahi ◽  
Sabri Boughorbel

Abstract Background Predictive modeling with longitudinal electronic health record (EHR) data offers great promise for accelerating personalized medicine and better informs clinical decision-making. Recently, deep learning models have achieved state-of-the-art performance for many healthcare prediction tasks. However, deep models lack interpretability, which is integral to successful decision-making and can lead to better patient care. In this paper, we build upon the contextual decomposition (CD) method, an algorithm for producing importance scores from long short-term memory networks (LSTMs). We extend the method to bidirectional LSTMs (BiLSTMs) and use it in the context of predicting future clinical outcomes using patients’ EHR historical visits. Methods We use a real EHR dataset comprising 11071 patients, to evaluate and compare CD interpretations from LSTM and BiLSTM models. First, we train LSTM and BiLSTM models for the task of predicting which pre-school children with respiratory system-related complications will have asthma at school-age. After that, we conduct quantitative and qualitative analysis to evaluate the CD interpretations produced by the contextual decomposition of the trained models. In addition, we develop an interactive visualization to demonstrate the utility of CD scores in explaining predicted outcomes. Results Our experimental evaluation demonstrate that whenever a clear visit-level pattern exists, the models learn that pattern and the contextual decomposition can appropriately attribute the prediction to the correct pattern. In addition, the results confirm that the CD scores agree to a large extent with the importance scores generated using logistic regression coefficients. Our main insight was that rather than interpreting the attribution of individual visits to the predicted outcome, we could instead attribute a model’s prediction to a group of visits. Conclusion We presented a quantitative and qualitative evidence that CD interpretations can explain patient-specific predictions using CD attributions of individual visits or a group of visits.


2021 ◽  
Author(s):  
Herty Liany ◽  
Anand Jeyasekharan ◽  
Vaibhav Rajan

Advances in next-generation sequencing technologies have led to the development of personalized genomic profiles in diagnostic panels that inform oncologists of alterations in clinically relevant genes. While targeted therapies for some alterations may be found, an effective therapeutic strategy should consider multiple and dependent genetic interactions that affect cancer progression, a task which remains challenging. There are ongoing efforts to profile cancer cells in-vitro, both to catalog their genomic information and study their sensitivity to various drugs. There is a need for tools that can interpret the personalized genomic profile of a patient in light of information from these biological and pre-clinical studies and recommend potentially useful drugs. To address this need, we develop a new algorithmic framework called DruID, to effectively combine drug efficacy predictions from a deep neural network model with information, such as drug sensitivity, drug-drug interactions and genetic dependencies, from multiple publicly available databases. We empirically evaluate DruID on cancer cell line data on which efficacy of many drugs have been experimentally determined. We find that DruID outperforms competing approaches and promises to be a useful tool in clinical decision-making.


F1000Research ◽  
2016 ◽  
Vol 5 ◽  
pp. 2592
Author(s):  
Martin D. King ◽  
Suresh Pujar ◽  
Rod C. Scott

Background The seizure-count time series data acquired from three children with refractory epilepsy were used in a statistical modelling analysis designed to provide an explanation for the marked variation in seizure frequency that often occurs over time (over-dispersed Poisson behaviour). This was motivated by an expectation that a better understanding of the spontaneous shifts in seizure-activity that are observed in some cases should reduce the risk of over-treatment caused by inappropriate changes in medication. Methods The analyses were performed using Poisson hidden Markov models (HMMs), both Bayesian and non-Bayesian, implemented using Markov chain Monte Carlo and the expectation-maximisation algorithm, respectively. A defining feature of the models, as applied to epilepsy, is the assumed existence of two or more pathological states, with state-specific Poisson rates, and random transitions between the states. Posterior predictive simulation was used to assess the validity of the Bayesian HMMs. Results The results are presented in the form of state transition probability and Poisson rate estimates (i.e., the primary HMM parameters), together with information derived from these primary parameters. State-specific mean-duration (sojourn time) estimates and sojourn-time complementary cumulative probability distributions are the main focus. HMM analyses are presented for three children that differed markedly in their seizure behaviour. The first is characterised by an extreme seizure count on one occasion; the second underwent a spontaneous decrease in seizure activity during the observation period; the third seizure-count time trajectory is characterised by a gradual change in mean seizure activity. We show that, despite their considerable differences, each of the observed seizure-count trajectories can be treated adequately using an HMM. Conclusions The study demonstrates that clinically relevant information can be obtained using HM modelling in three cases with markedly different seizure behaviour. The resulting subject-specific statistics provide useful clinical insights which should aid those engaged in clinical decision making.


2021 ◽  
Vol 4 ◽  
Author(s):  
Daria Kurz ◽  
Carlos Salort Sánchez ◽  
Cristian Axenie

For decades, researchers have used the concepts of rate of change and differential equations to model and forecast neoplastic processes. This expressive mathematical apparatus brought significant insights in oncology by describing the unregulated proliferation and host interactions of cancer cells, as well as their response to treatments. Now, these theories have been given a new life and found new applications. With the advent of routine cancer genome sequencing and the resulting abundance of data, oncology now builds an “arsenal” of new modeling and analysis tools. Models describing the governing physical laws of tumor–host–drug interactions can be now challenged with biological data to make predictions about cancer progression. Our study joins the efforts of the mathematical and computational oncology community by introducing a novel machine learning system for data-driven discovery of mathematical and physical relations in oncology. The system utilizes computational mechanisms such as competition, cooperation, and adaptation in neural networks to simultaneously learn the statistics and the governing relations between multiple clinical data covariates. Targeting an easy adoption in clinical oncology, the solutions of our system reveal human-understandable properties and features hidden in the data. As our experiments demonstrate, our system can describe nonlinear conservation laws in cancer kinetics and growth curves, symmetries in tumor’s phenotypic staging transitions, the preoperative spatial tumor distribution, and up to the nonlinear intracellular and extracellular pharmacokinetics of neoadjuvant therapies. The primary goal of our work is to enhance or improve the mechanistic understanding of cancer dynamics by exploiting heterogeneous clinical data. We demonstrate through multiple instantiations that our system is extracting an accurate human-understandable representation of the underlying dynamics of physical interactions central to typical oncology problems. Our results and evaluation demonstrate that, using simple—yet powerful—computational mechanisms, such a machine learning system can support clinical decision-making. To this end, our system is a representative tool of the field of mathematical and computational oncology and offers a bridge between the data, the modeler, the data scientist, and the practicing clinician.


Sign in / Sign up

Export Citation Format

Share Document