Data-Driven Discovery of Mathematical and Physical Relations in Oncology Data Using Human-Understandable Machine Learning

For decades, researchers have used the concepts of rate of change and differential equations to model and forecast neoplastic processes. This expressive mathematical apparatus brought significant insights in oncology by describing the unregulated proliferation and host interactions of cancer cells, as well as their response to treatments. Now, these theories have been given a new life and found new applications. With the advent of routine cancer genome sequencing and the resulting abundance of data, oncology now builds an “arsenal” of new modeling and analysis tools. Models describing the governing physical laws of tumor–host–drug interactions can be now challenged with biological data to make predictions about cancer progression. Our study joins the efforts of the mathematical and computational oncology community by introducing a novel machine learning system for data-driven discovery of mathematical and physical relations in oncology. The system utilizes computational mechanisms such as competition, cooperation, and adaptation in neural networks to simultaneously learn the statistics and the governing relations between multiple clinical data covariates. Targeting an easy adoption in clinical oncology, the solutions of our system reveal human-understandable properties and features hidden in the data. As our experiments demonstrate, our system can describe nonlinear conservation laws in cancer kinetics and growth curves, symmetries in tumor’s phenotypic staging transitions, the preoperative spatial tumor distribution, and up to the nonlinear intracellular and extracellular pharmacokinetics of neoadjuvant therapies. The primary goal of our work is to enhance or improve the mechanistic understanding of cancer dynamics by exploiting heterogeneous clinical data. We demonstrate through multiple instantiations that our system is extracting an accurate human-understandable representation of the underlying dynamics of physical interactions central to typical oncology problems. Our results and evaluation demonstrate that, using simple—yet powerful—computational mechanisms, such a machine learning system can support clinical decision-making. To this end, our system is a representative tool of the field of mathematical and computational oncology and offers a bridge between the data, the modeler, the data scientist, and the practicing clinician.

Download Full-text

Data-driven Discovery of Mathematical and Physical Relations in Oncology Data using Human-understandable Machine Learning

10.1101/2021.08.13.456200 ◽

2021 ◽

Author(s):

Daria Kurz ◽

Carlos Salort S&aacutenchez ◽

Cristian Axenie

Keyword(s):

Machine Learning ◽

Cancer Progression ◽

Clinical Data ◽

Clinical Decision Making ◽

Clinical Decision ◽

Biological Data ◽

Rate Of Change ◽

Learning System ◽

Data Driven ◽

Computational Oncology

For decades, researchers have used the concepts of rate of change and differential equations to model and forecast neoplastic processes. This expressive mathematical apparatus brought significant insights in oncology by describing the unregulated proliferation and host interactions of cancer cells, as well as their response to treatments. Now, these theories have been given a new life and found new applications. With the advent of routine cancer genome sequencing and the resulting abundance of data, oncology now builds an "arsenal" of new modeling and analysis tools. Models describing the governing physical laws of tumor-host-drug interactions can be now challenged with biological data to make predictions about cancer progression. Our study joins the efforts of the mathematical and computational oncology community by introducing a novel machine learning system for data-driven discovery of mathematical and physical relations in oncology. The system utilizes computational mechanisms such as competition, cooperation, and adaptation in neural networks to simultaneously learn the statistics and the governing relations between multiple clinical data covariates. Targeting an easy adoption in clinical oncology, the solutions of our system reveal human-understandable properties and features hidden in the data. As our experiments demonstrate, our system can describe nonlinear conservation laws in cancer kinetics and growth curves, symmetries in tumor's phenotypic staging transitions, the pre-operative spatial tumor distribution, and up to the nonlinear intracellular and extracellular pharmacokinetics of neoadjuvant therapies. The primary goal of our work is to enhance or improve the mechanistic understanding of cancer dynamics by exploiting heterogeneous clinical data. We demonstrate through multiple instantiations that our system is extracting an accurate human-understandable representation of the underlying dynamics of physical interactions central to typical oncology problems. Our results and evaluation demonstrate that using simple - yet powerful - computational mechanisms, such a machine learning system can support clinical decision making. To this end, our system is a representative tool of the field of mathematical and computational oncology and offers a bridge between the data, the modeler, the data scientist, and the practising clinician.

Download Full-text

Gaussian noise up-sampling is better suited than SMOTE and ADASYN for clinical decision making

BioData Mining ◽

10.1186/s13040-021-00283-6 ◽

2021 ◽

Vol 14 (1) ◽

Author(s):

Jacqueline Beinecke ◽

Dominik Heider

Keyword(s):

Machine Learning ◽

Clinical Data ◽

Gaussian Noise ◽

Missing Values ◽

Clinical Decision Making ◽

Big Data Analytics ◽

Class Imbalance ◽

Clinical Decision ◽

Data Sets ◽

Augmentation Techniques

AbstractClinical data sets have very special properties and suffer from many caveats in machine learning. They typically show a high-class imbalance, have a small number of samples and a large number of parameters, and have missing values. While feature selection approaches and imputation techniques address the former problems, the class imbalance is typically addressed using augmentation techniques. However, these techniques have been developed for big data analytics, and their suitability for clinical data sets is unclear.This study analyzed different augmentation techniques for use in clinical data sets and subsequent employment of machine learning-based classification. It turns out that Gaussian Noise Up-Sampling (GNUS) is not always but generally, is as good as SMOTE and ADASYN and even outperform those on some datasets. However, it has also been shown that augmentation does not improve classification at all in some cases.

Download Full-text

Machine learning in clinical decision making

Med ◽

10.1016/j.medj.2021.04.006 ◽

2021 ◽

Author(s):

Lorenz Adlung ◽

Yotam Cohen ◽

Uria Mor ◽

Eran Elinav

Keyword(s):

Machine Learning ◽

Decision Making ◽

Clinical Decision Making ◽

Clinical Decision

Download Full-text

Cancer Grade Model: a multi-gene machine learning-based risk classification for improving prognosis in breast cancer

British Journal of Cancer ◽

10.1038/s41416-021-01455-1 ◽

2021 ◽

Author(s):

E. Amiri Souri ◽

A. Chenoweth ◽

A. Cheung ◽

S. N. Karagiannis ◽

S. Tsoka

Keyword(s):

Breast Cancer ◽

Machine Learning ◽

Clinical Decision Making ◽

Histological Grade ◽

Tumour Size ◽

Clinical Decision ◽

Gene Signature ◽

Risk Classification ◽

Breast Cancers ◽

Grade 3

Abstract Background Prognostic stratification of breast cancers remains a challenge to improve clinical decision making. We employ machine learning on breast cancer transcriptomics from multiple studies to link the expression of specific genes to histological grade and classify tumours into a more or less aggressive prognostic type. Materials and methods Microarray data of 5031 untreated breast tumours spanning 33 published datasets and corresponding clinical data were integrated. A machine learning model based on gradient boosted trees was trained on histological grade-1 and grade-3 samples. The resulting predictive model (Cancer Grade Model, CGM) was applied on samples of grade-2 and unknown-grade (3029) for prognostic risk classification. Results A 70-gene signature for assessing clinical risk was identified and was shown to be 90% accurate when tested on known histological-grade samples. The predictive framework was validated through survival analysis and showed robust prognostic performance. CGM was cross-referenced with existing genomic tests and demonstrated the competitive predictive power of tumour risk. Conclusions CGM is able to classify tumours into better-defined prognostic categories without employing information on tumour size, stage, or subgroups. The model offers means to improve prognosis and support the clinical decision and precision treatments, thereby potentially contributing to preventing underdiagnosis of high-risk tumours and minimising over-treatment of low-risk disease.

Download Full-text

Clinician checklist for assessing suitability of machine learning applications in healthcare

BMJ Health & Care Informatics ◽

10.1136/bmjhci-2020-100251 ◽

2021 ◽

Vol 28 (1) ◽

pp. e100251

Author(s):

Ian Scott ◽

Stacey Carter ◽

Enrico Coiera

Keyword(s):

Machine Learning ◽

Large Scale ◽

Clinical Decision Making ◽

Improve Patient Care ◽

Clinical Decision ◽

Routine Care ◽

Machine Learning Algorithms ◽

Clinical Settings ◽

Machine Learning Applications ◽

Key Issues

Machine learning algorithms are being used to screen and diagnose disease, prognosticate and predict therapeutic responses. Hundreds of new algorithms are being developed, but whether they improve clinical decision making and patient outcomes remains uncertain. If clinicians are to use algorithms, they need to be reassured that key issues relating to their validity, utility, feasibility, safety and ethical use have been addressed. We propose a checklist of 10 questions that clinicians can ask of those advocating for the use of a particular algorithm, but which do not expect clinicians, as non-experts, to demonstrate mastery over what can be highly complex statistical and computational concepts. The questions are: (1) What is the purpose and context of the algorithm? (2) How good were the data used to train the algorithm? (3) Were there sufficient data to train the algorithm? (4) How well does the algorithm perform? (5) Is the algorithm transferable to new clinical settings? (6) Are the outputs of the algorithm clinically intelligible? (7) How will this algorithm fit into and complement current workflows? (8) Has use of the algorithm been shown to improve patient care and outcomes? (9) Could the algorithm cause patient harm? and (10) Does use of the algorithm raise ethical, legal or social concerns? We provide examples where an algorithm may raise concerns and apply the checklist to a recent review of diagnostic imaging applications. This checklist aims to assist clinicians in assessing algorithm readiness for routine care and identify situations where further refinement and evaluation is required prior to large-scale use.

Download Full-text

Machine-learning based prediction of Cushing’s syndrome in dogs attending UK primary-care veterinary practice

Scientific Reports ◽

10.1038/s41598-021-88440-z ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Imogen Schofield ◽

David C. Brodbelt ◽

Noel Kennedy ◽

Stijn J. M. Niessen ◽

David B. Church ◽

...

Keyword(s):

Machine Learning ◽

Cushing’S Syndrome ◽

Clinical Decision Making ◽

Predictive Performance ◽

Clinical Decision ◽

Cushing's Syndrome ◽

Machine Learning Algorithms ◽

Learning Methods ◽

Machine Learning Methods ◽

Clinical Records

AbstractCushing’s syndrome is an endocrine disease in dogs that negatively impacts upon the quality-of-life of affected animals. Cushing’s syndrome can be a challenging diagnosis to confirm, therefore new methods to aid diagnosis are warranted. Four machine-learning algorithms were applied to predict a future diagnosis of Cushing's syndrome, using structured clinical data from the VetCompass programme in the UK. Dogs suspected of having Cushing's syndrome were included in the analysis and classified based on their final reported diagnosis within their clinical records. Demographic and clinical features available at the point of first suspicion by the attending veterinarian were included within the models. The machine-learning methods were able to classify the recorded Cushing’s syndrome diagnoses, with good predictive performance. The LASSO penalised regression model indicated the best overall performance when applied to the test set with an AUROC = 0.85 (95% CI 0.80–0.89), sensitivity = 0.71, specificity = 0.82, PPV = 0.75 and NPV = 0.78. The findings of our study indicate that machine-learning methods could predict the future diagnosis of a practicing veterinarian. New approaches using these methods could support clinical decision-making and contribute to improved diagnosis of Cushing’s syndrome in dogs.

Download Full-text

Machine learning for dose-volume histogram based clinical decision-making support system in radiation therapy plans for brain tumors

Clinical and Translational Radiation Oncology ◽

10.1016/j.ctro.2021.09.001 ◽

2021 ◽

Author(s):

Pawel Siciarz ◽

Salem Alfaifi ◽

Eric Van Uytven ◽

Shrinivas Rathod ◽

Rashmi Koul ◽

...

Keyword(s):

Machine Learning ◽

Decision Making ◽

Radiation Therapy ◽

Brain Tumors ◽

Support System ◽

Clinical Decision Making ◽

Clinical Decision ◽

Dose Volume Histogram ◽

Dose Volume ◽

Decision Making Support

Download Full-text

Combined data mining techniques based patient data outlier detection for healthcare safety

International Journal of Intelligent Computing and Cybernetics ◽

10.1108/ijicc-07-2015-0024 ◽

2016 ◽

Vol 9 (1) ◽

pp. 42-68 ◽

Cited By ~ 10

Author(s):

Gebeyehu Belay Gebremeskel ◽

Chai Yi ◽

Zhongshi He ◽

Dawit Haile

Keyword(s):

Data Mining ◽

Decision Making ◽

Patient Safety ◽

Outlier Detection ◽

Clinical Data ◽

Clinical Decision Making ◽

Clinical Decision ◽

Healthcare Services ◽

Content Type ◽

Outliers Detection

Purpose – Among the growing number of data mining (DM) techniques, outlier detection has gained importance in many applications and also attracted much attention in recent times. In the past, outlier detection researched papers appeared in a safety care that can view as searching for the needles in the haystack. However, outliers are not always erroneous. Therefore, the purpose of this paper is to investigate the role of outliers in healthcare services in general and patient safety care, in particular. Design/methodology/approach – It is a combined DM (clustering and the nearest neighbor) technique for outliers’ detection, which provides a clear understanding and meaningful insights to visualize the data behaviors for healthcare safety. The outcomes or the knowledge implicit is vitally essential to a proper clinical decision-making process. The method is important to the semantic, and the novel tactic of patients’ events and situations prove that play a significant role in the process of patient care safety and medications. Findings – The outcomes of the paper is discussing a novel and integrated methodology, which can be inferring for different biological data analysis. It is discussed as integrated DM techniques to optimize its performance in the field of health and medical science. It is an integrated method of outliers detection that can be extending for searching valuable information and knowledge implicit based on selected patient factors. Based on these facts, outliers are detected as clusters and point events, and novel ideas proposed to empower clinical services in consideration of customers’ satisfactions. It is also essential to be a baseline for further healthcare strategic development and research works. Research limitations/implications – This paper mainly focussed on outliers detections. Outlier isolation that are essential to investigate the reason how it happened and communications how to mitigate it did not touch. Therefore, the research can be extended more about the hierarchy of patient problems. Originality/value – DM is a dynamic and successful gateway for discovering useful knowledge for enhancing healthcare performances and patient safety. Clinical data based outlier detection is a basic task to achieve healthcare strategy. Therefore, in this paper, the authors focussed on combined DM techniques for a deep analysis of clinical data, which provide an optimal level of clinical decision-making processes. Proper clinical decisions can obtain in terms of attributes selections that important to know the influential factors or parameters of healthcare services. Therefore, using integrated clustering and nearest neighbors techniques give more acceptable searched such complex data outliers, which could be fundamental to further analysis of healthcare and patient safety situational analysis.

Download Full-text

Critical Care, Critical Data

Biomedical Engineering and Computational Biology ◽

10.1177/1179597219856564 ◽

2019 ◽

Vol 10 ◽

pp. 117959721985656 ◽

Cited By ~ 10

Author(s):

Christopher V Cosgriff ◽

Leo Anthony Celi ◽

David J Stone

Keyword(s):

Artificial Intelligence ◽

Critical Care ◽

Clinical Data ◽

Continuous Monitoring ◽

Data Science ◽

Clinical Decision ◽

Critical Care Medicine ◽

Data Driven ◽

Critical Data

As big data, machine learning, and artificial intelligence continue to penetrate into and transform many facets of our lives, we are witnessing the emergence of these powerful technologies within health care. The use and growth of these technologies has been contingent on the availability of reliable and usable data, a particularly robust resource in critical care medicine where continuous monitoring forms a key component of the infrastructure of care. The response to this opportunity has included the development of open databases for research and other purposes; the development of a collaborative form of clinical data science intended to fully leverage these data resources, and the creation of data-driven applications for purposes such as clinical decision support. Most recently, data levels have reached the thresholds required for the development of robust artificial intelligence features for clinical purposes. The systematic capture and analysis of clinical data in both individuals and populations allows us to begin to move toward precision medicine in the intensive care unit (ICU). In this perspective review, we examine the fundamental role of data as we present the current progress that has been made toward an artificial intelligence (AI)-supported, data-driven precision critical care medicine.

Download Full-text

Vital signs assessed in initial clinical encounters predict COVID-19 mortality in an NYC hospital system

Scientific Reports ◽

10.1038/s41598-020-78392-1 ◽

2020 ◽

Vol 10 (1) ◽

Author(s):

Elza Rechtman ◽

Paul Curtin ◽

Esmeralda Navarro ◽

Sharon Nirenberg ◽

Megan K. Horton

Keyword(s):

Machine Learning ◽

New York ◽

Clinical Decision Making ◽

Care Delivery ◽

Vital Signs ◽

Clinical Decision ◽

Rapid Identification ◽

Gradient Boosting ◽

Hospital System ◽

Response Strategies

AbstractTimely and effective clinical decision-making for COVID-19 requires rapid identification of risk factors for disease outcomes. Our objective was to identify characteristics available immediately upon first clinical evaluation related COVID-19 mortality. We conducted a retrospective study of 8770 laboratory-confirmed cases of SARS-CoV-2 from a network of 53 facilities in New-York City. We analysed 3 classes of variables; demographic, clinical, and comorbid factors, in a two-tiered analysis that included traditional regression strategies and machine learning. COVID-19 mortality was 12.7%. Logistic regression identified older age (OR, 1.69 [95% CI 1.66–1.92]), male sex (OR, 1.57 [95% CI 1.30–1.90]), higher BMI (OR, 1.03 [95% CI 1.102–1.05]), higher heart rate (OR, 1.01 [95% CI 1.00–1.01]), higher respiratory rate (OR, 1.05 [95% CI 1.03–1.07]), lower oxygen saturation (OR, 0.94 [95% CI 0.93–0.96]), and chronic kidney disease (OR, 1.53 [95% CI 1.20–1.95]) were associated with COVID-19 mortality. Using gradient-boosting machine learning, these factors predicted COVID-19 related mortality (AUC = 0.86) following cross-validation in a training set. Immediate, objective and culturally generalizable measures accessible upon clinical presentation are effective predictors of COVID-19 outcome. These findings may inform rapid response strategies to optimize health care delivery in parts of the world who have not yet confronted this epidemic, as well as in those forecasting a possible second outbreak.

Download Full-text