scholarly journals A machine learning approach to predicting short-term mortality risk in patients starting chemotherapy

2017 ◽  
Author(s):  
Aymen A. Elfiky ◽  
Maximilian J. Pany ◽  
Ravi B. Parikh ◽  
Ziad Obermeyer

ABSTRACTBackgroundCancer patients who die soon after starting chemotherapy incur costs of treatment without benefits. Accurately predicting mortality risk from chemotherapy is important, but few patient data-driven tools exist. We sought to create and validate a machine learning model predicting mortality for patients starting new chemotherapy.MethodsWe obtained electronic health records for patients treated at a large cancer center (26,946 patients; 51,774 new regimens) over 2004-14, linked to Social Security data for date of death. The model was derived using 2004-11 data, and performance measured on non-overlapping 2012-14 data.Findings30-day mortality from chemotherapy start was 2.1%. Common cancers included breast (21.1%), colorectal (19.3%), and lung (18.0%). Model predictions were accurate for all patients (AUC 0.94). Predictions for patients starting palliative chemotherapy (46.6% of regimens), for whom prognosis is particularly important, remained highly accurate (AUC 0.92). To illustrate model discrimination, we ranked patients initiating palliative chemotherapy by model-predicted mortality risk, and calculated observed mortality by risk decile. 30-day mortality in the highest-risk decile was 22.6%; in the lowest-risk decile, no patients died. Predictions remained accurate across all primary cancers, stages, and chemotherapies—even for clinical trial regimens that first appeared in years after the model was trained (AUC 0.94). The model also performed well for prediction of 180-day mortality (AUC 0.87; mortality 74.8% in the highest risk decile vs. 0.2% in the lowest). Predictions were more accurate than data from randomized trials of individual chemotherapies, or SEER estimates.InterpretationA machine learning algorithm accurately predicted short-term mortality in patients starting chemotherapy using EHR data. Further research is necessary to determine generalizability and the feasibility of applying this algorithm in clinical settings.

2017 ◽  
Vol 35 (15_suppl) ◽  
pp. 6538-6538
Author(s):  
Ravi Bharat Parikh ◽  
Aymen Elfiky ◽  
Maximilian J. Pany ◽  
Ziad Obermeyer

6538 Background: Patients who die soon after starting chemotherapy incur symptoms and financial costs without survival benefit. Prognostic uncertainty may contribute to increasing chemotherapy use near the end of life, but few prognostic aids exist to guide physicians and patients in the decision to initiate chemotherapy. Methods: We obtained all electronic health record (EHR) data from 2004-14 from a large national cancer center, linked to Social Security data to determine date of death. Using EHR data before treatment initiation, we created a machine learning (ML) model to predict 180-day mortality from the start of chemotherapy. We derived the model using data from 2004-11 and report predictive performance on data from 2012-14. Results: 26,946 patients initiated 51,774 discrete chemotherapy regimens over the study period; 49% received multiple lines of chemotherapy. The most common cancers were breast (23.6%), colorectal (17.6%), and lung (16.6%). 18.4% of patients died within 180 days after chemotherapy initiation. Model predictions were used to rank patients in the validation cohort by predicted risk. Patients in the highest decile of predicted risk had a 180-day mortality of 74.8%, vs. 0.2% in the lowest decile (area under the receiver-operating characteristic curve [AUC] 0.87). Predictions were accurate for patients with metastatic disease (AUC 0.85) and for individual primary cancers and chemotherapy regimens—including experimental regimens not present in the derivation sample. Model predictions were valid for 30- and 90-day mortality (AUC 0.94 and 0.89, respectively). ML predictions outperformed regimen-based mortality estimates from randomized trials (RT) (AUC 0.77 [ML] vs. 0.56 [RT]), and National Cancer Institute Surveillance, Epidemiology, and End Results Program (SEER) estimates (AUC 0.81 [ML] vs. 0.40 [SEER]). Conclusions: Using EHR data from a single cancer center, we derived a machine learning algorithm that accurately predicted short-term mortality after chemotherapy initiation. Further research is necessary to determine applications of this algorithm in clinical settings and whether this tool can improve shared decision making leading up to chemotherapy initiation.


2021 ◽  
Vol 4 (1) ◽  
Author(s):  
Ravi B. Parikh ◽  
Manqing Liu ◽  
Eric Li ◽  
Runze Li ◽  
Jinbo Chen

AbstractMachine learning algorithms may address prognostic inaccuracy among clinicians by identifying patients at risk of short-term mortality and facilitating earlier discussions about hospice enrollment, discontinuation of therapy, or other management decisions. In the present study, we used prospective predictions from a real-time machine learning prognostic algorithm to identify two trajectories of all-cause mortality risk for decedents with cancer. We show that patients with an unpredictable trajectory, where mortality risk rises only close to death, are significantly less likely to receive guideline-based end-of-life care and may not benefit from the integration of prognostic algorithms in practice.


Entropy ◽  
2021 ◽  
Vol 23 (3) ◽  
pp. 300
Author(s):  
Mark Lokanan ◽  
Susan Liu

Protecting financial consumers from investment fraud has been a recurring problem in Canada. The purpose of this paper is to predict the demographic characteristics of investors who are likely to be victims of investment fraud. Data for this paper came from the Investment Industry Regulatory Organization of Canada’s (IIROC) database between January of 2009 and December of 2019. In total, 4575 investors were coded as victims of investment fraud. The study employed a machine-learning algorithm to predict the probability of fraud victimization. The machine learning model deployed in this paper predicted the typical demographic profile of fraud victims as investors who classify as female, have poor financial knowledge, know the advisor from the past, and are retired. Investors who are characterized as having limited financial literacy but a long-time relationship with their advisor have reduced probabilities of being victimized. However, male investors with low or moderate-level investment knowledge were more likely to be preyed upon by their investment advisors. While not statistically significant, older adults, in general, are at greater risk of being victimized. The findings from this paper can be used by Canadian self-regulatory organizations and securities commissions to inform their investors’ protection mandates.


Friction ◽  
2021 ◽  
Author(s):  
Vigneashwara Pandiyan ◽  
Josef Prost ◽  
Georg Vorlaufer ◽  
Markus Varga ◽  
Kilian Wasmer

AbstractFunctional surfaces in relative contact and motion are prone to wear and tear, resulting in loss of efficiency and performance of the workpieces/machines. Wear occurs in the form of adhesion, abrasion, scuffing, galling, and scoring between contacts. However, the rate of the wear phenomenon depends primarily on the physical properties and the surrounding environment. Monitoring the integrity of surfaces by offline inspections leads to significant wasted machine time. A potential alternate option to offline inspection currently practiced in industries is the analysis of sensors signatures capable of capturing the wear state and correlating it with the wear phenomenon, followed by in situ classification using a state-of-the-art machine learning (ML) algorithm. Though this technique is better than offline inspection, it possesses inherent disadvantages for training the ML models. Ideally, supervised training of ML models requires the datasets considered for the classification to be of equal weightage to avoid biasing. The collection of such a dataset is very cumbersome and expensive in practice, as in real industrial applications, the malfunction period is minimal compared to normal operation. Furthermore, classification models would not classify new wear phenomena from the normal regime if they are unfamiliar. As a promising alternative, in this work, we propose a methodology able to differentiate the abnormal regimes, i.e., wear phenomenon regimes, from the normal regime. This is carried out by familiarizing the ML algorithms only with the distribution of the acoustic emission (AE) signals captured using a microphone related to the normal regime. As a result, the ML algorithms would be able to detect whether some overlaps exist with the learnt distributions when a new, unseen signal arrives. To achieve this goal, a generative convolutional neural network (CNN) architecture based on variational auto encoder (VAE) is built and trained. During the validation procedure of the proposed CNN architectures, we were capable of identifying acoustics signals corresponding to the normal and abnormal wear regime with an accuracy of 97% and 80%. Hence, our approach shows very promising results for in situ and real-time condition monitoring or even wear prediction in tribological applications.


Water ◽  
2021 ◽  
Vol 13 (9) ◽  
pp. 1217
Author(s):  
Nicolò Bellin ◽  
Erica Racchetti ◽  
Catia Maurone ◽  
Marco Bartoli ◽  
Valeria Rossi

Machine Learning (ML) is an increasingly accessible discipline in computer science that develops dynamic algorithms capable of data-driven decisions and whose use in ecology is growing. Fuzzy sets are suitable descriptors of ecological communities as compared to other standard algorithms and allow the description of decisions that include elements of uncertainty and vagueness. However, fuzzy sets are scarcely applied in ecology. In this work, an unsupervised machine learning algorithm, fuzzy c-means and association rules mining were applied to assess the factors influencing the assemblage composition and distribution patterns of 12 zooplankton taxa in 24 shallow ponds in northern Italy. The fuzzy c-means algorithm was implemented to classify the ponds in terms of taxa they support, and to identify the influence of chemical and physical environmental features on the assemblage patterns. Data retrieved during 2014 and 2015 were compared, taking into account that 2014 late spring and summer air temperatures were much lower than historical records, whereas 2015 mean monthly air temperatures were much warmer than historical averages. In both years, fuzzy c-means show a strong clustering of ponds in two groups, contrasting sites characterized by different physico-chemical and biological features. Climatic anomalies, affecting the temperature regime, together with the main water supply to shallow ponds (e.g., surface runoff vs. groundwater) represent disturbance factors producing large interannual differences in the chemistry, biology and short-term dynamic of small aquatic ecosystems. Unsupervised machine learning algorithms and fuzzy sets may help in catching such apparently erratic differences.


Sign in / Sign up

Export Citation Format

Share Document