A machine learning approach to predicting short-term mortality risk for patients starting chemotherapy.

6538 Background: Patients who die soon after starting chemotherapy incur symptoms and financial costs without survival benefit. Prognostic uncertainty may contribute to increasing chemotherapy use near the end of life, but few prognostic aids exist to guide physicians and patients in the decision to initiate chemotherapy. Methods: We obtained all electronic health record (EHR) data from 2004-14 from a large national cancer center, linked to Social Security data to determine date of death. Using EHR data before treatment initiation, we created a machine learning (ML) model to predict 180-day mortality from the start of chemotherapy. We derived the model using data from 2004-11 and report predictive performance on data from 2012-14. Results: 26,946 patients initiated 51,774 discrete chemotherapy regimens over the study period; 49% received multiple lines of chemotherapy. The most common cancers were breast (23.6%), colorectal (17.6%), and lung (16.6%). 18.4% of patients died within 180 days after chemotherapy initiation. Model predictions were used to rank patients in the validation cohort by predicted risk. Patients in the highest decile of predicted risk had a 180-day mortality of 74.8%, vs. 0.2% in the lowest decile (area under the receiver-operating characteristic curve [AUC] 0.87). Predictions were accurate for patients with metastatic disease (AUC 0.85) and for individual primary cancers and chemotherapy regimens—including experimental regimens not present in the derivation sample. Model predictions were valid for 30- and 90-day mortality (AUC 0.94 and 0.89, respectively). ML predictions outperformed regimen-based mortality estimates from randomized trials (RT) (AUC 0.77 [ML] vs. 0.56 [RT]), and National Cancer Institute Surveillance, Epidemiology, and End Results Program (SEER) estimates (AUC 0.81 [ML] vs. 0.40 [SEER]). Conclusions: Using EHR data from a single cancer center, we derived a machine learning algorithm that accurately predicted short-term mortality after chemotherapy initiation. Further research is necessary to determine applications of this algorithm in clinical settings and whether this tool can improve shared decision making leading up to chemotherapy initiation.

Download Full-text

A machine learning approach to predicting short-term mortality risk in patients starting chemotherapy

10.1101/204081 ◽

2017 ◽

Cited By ~ 2

Author(s):

Aymen A. Elfiky ◽

Maximilian J. Pany ◽

Ravi B. Parikh ◽

Ziad Obermeyer

Keyword(s):

Machine Learning ◽

Mortality Risk ◽

Palliative Chemotherapy ◽

Learning Algorithm ◽

Cancer Center ◽

Short Term ◽

Machine Learning Model ◽

Machine Learning Approach ◽

Short Term Mortality ◽

And Performance

ABSTRACTBackgroundCancer patients who die soon after starting chemotherapy incur costs of treatment without benefits. Accurately predicting mortality risk from chemotherapy is important, but few patient data-driven tools exist. We sought to create and validate a machine learning model predicting mortality for patients starting new chemotherapy.MethodsWe obtained electronic health records for patients treated at a large cancer center (26,946 patients; 51,774 new regimens) over 2004-14, linked to Social Security data for date of death. The model was derived using 2004-11 data, and performance measured on non-overlapping 2012-14 data.Findings30-day mortality from chemotherapy start was 2.1%. Common cancers included breast (21.1%), colorectal (19.3%), and lung (18.0%). Model predictions were accurate for all patients (AUC 0.94). Predictions for patients starting palliative chemotherapy (46.6% of regimens), for whom prognosis is particularly important, remained highly accurate (AUC 0.92). To illustrate model discrimination, we ranked patients initiating palliative chemotherapy by model-predicted mortality risk, and calculated observed mortality by risk decile. 30-day mortality in the highest-risk decile was 22.6%; in the lowest-risk decile, no patients died. Predictions remained accurate across all primary cancers, stages, and chemotherapies—even for clinical trial regimens that first appeared in years after the model was trained (AUC 0.94). The model also performed well for prediction of 180-day mortality (AUC 0.87; mortality 74.8% in the highest risk decile vs. 0.2% in the lowest). Predictions were more accurate than data from randomized trials of individual chemotherapies, or SEER estimates.InterpretationA machine learning algorithm accurately predicted short-term mortality in patients starting chemotherapy using EHR data. Further research is necessary to determine generalizability and the feasibility of applying this algorithm in clinical settings.

Download Full-text

Development of Machine Learning Models to Predict Probabilities and Types of Stroke at Prehospital Stage: the Japan Urgent Stroke Triage Score Using Machine Learning (JUST-ML)

Translational Stroke Research ◽

10.1007/s12975-021-00937-x ◽

2021 ◽

Author(s):

Kazutaka Uchida ◽

Junichi Kouno ◽

Shinichi Yoshimura ◽

Norito Kinjo ◽

Fumihiro Sakakibara ◽

...

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Random Forests ◽

Prediction Models ◽

Characteristic Curve ◽

Predictive Performance ◽

Vessel Occlusion ◽

Predictive Values ◽

Training Cohort ◽

Sensitivity Specificity

AbstractIn conjunction with recent advancements in machine learning (ML), such technologies have been applied in various fields owing to their high predictive performance. We tried to develop prehospital stroke scale with ML. We conducted multi-center retrospective and prospective cohort study. The training cohort had eight centers in Japan from June 2015 to March 2018, and the test cohort had 13 centers from April 2019 to March 2020. We use the three different ML algorithms (logistic regression, random forests, XGBoost) to develop models. Main outcomes were large vessel occlusion (LVO), intracranial hemorrhage (ICH), subarachnoid hemorrhage (SAH), and cerebral infarction (CI) other than LVO. The predictive abilities were validated in the test cohort with accuracy, positive predictive value, sensitivity, specificity, area under the receiver operating characteristic curve (AUC), and F score. The training cohort included 3178 patients with 337 LVO, 487 ICH, 131 SAH, and 676 CI cases, and the test cohort included 3127 patients with 183 LVO, 372 ICH, 90 SAH, and 577 CI cases. The overall accuracies were 0.65, and the positive predictive values, sensitivities, specificities, AUCs, and F scores were stable in the test cohort. The classification abilities were also fair for all ML models. The AUCs for LVO of logistic regression, random forests, and XGBoost were 0.89, 0.89, and 0.88, respectively, in the test cohort, and these values were higher than the previously reported prediction models for LVO. The ML models developed to predict the probability and types of stroke at the prehospital stage had superior predictive abilities.

Download Full-text

Unsupervised Machine Learning and Data Mining Procedures Reveal Short Term, Climate Driven Patterns Linking Physico-Chemical Features and Zooplankton Diversity in Small Ponds

Water ◽

10.3390/w13091217 ◽

2021 ◽

Vol 13 (9) ◽

pp. 1217

Author(s):

Nicolò Bellin ◽

Erica Racchetti ◽

Catia Maurone ◽

Marco Bartoli ◽

Valeria Rossi

Keyword(s):

Machine Learning ◽

Fuzzy Sets ◽

Learning Algorithm ◽

Machine Learning Algorithms ◽

Short Term ◽

Unsupervised Machine Learning ◽

Fuzzy C Means ◽

Air Temperatures ◽

Physico Chemical ◽

Shallow Ponds

Machine Learning (ML) is an increasingly accessible discipline in computer science that develops dynamic algorithms capable of data-driven decisions and whose use in ecology is growing. Fuzzy sets are suitable descriptors of ecological communities as compared to other standard algorithms and allow the description of decisions that include elements of uncertainty and vagueness. However, fuzzy sets are scarcely applied in ecology. In this work, an unsupervised machine learning algorithm, fuzzy c-means and association rules mining were applied to assess the factors influencing the assemblage composition and distribution patterns of 12 zooplankton taxa in 24 shallow ponds in northern Italy. The fuzzy c-means algorithm was implemented to classify the ponds in terms of taxa they support, and to identify the influence of chemical and physical environmental features on the assemblage patterns. Data retrieved during 2014 and 2015 were compared, taking into account that 2014 late spring and summer air temperatures were much lower than historical records, whereas 2015 mean monthly air temperatures were much warmer than historical averages. In both years, fuzzy c-means show a strong clustering of ponds in two groups, contrasting sites characterized by different physico-chemical and biological features. Climatic anomalies, affecting the temperature regime, together with the main water supply to shallow ponds (e.g., surface runoff vs. groundwater) represent disturbance factors producing large interannual differences in the chemistry, biology and short-term dynamic of small aquatic ecosystems. Unsupervised machine learning algorithms and fuzzy sets may help in catching such apparently erratic differences.

Download Full-text

Nowcasting heavy precipitation over the Netherlands using a 13-year radar archive: a machine learning approach

10.5194/egusphere-egu21-12814 ◽

2021 ◽

Author(s):

Eva van der Kooij ◽

Marc Schleiss ◽

Riccardo Taormina ◽

Francesco Fioranelli ◽

Dorien Lugt ◽

...

Keyword(s):

Machine Learning ◽

The Netherlands ◽

Heavy Rainfall ◽

Predictive Performance ◽

Heavy Precipitation ◽

Early Warning Systems ◽

Training Data ◽

Short Term ◽

Data Set ◽

Radar Images

Accurate short-term forecasts, also known as nowcasts, of heavy precipitation are desirable for creating early warning systems for extreme weather and its consequences, e.g. urban flooding. In this research, we explore the use of machine learning for short-term prediction of heavy rainfall showers in the Netherlands.We assess the performance of a recurrent, convolutional neural network (TrajGRU) with lead times of 0 to 2 hours. The network is trained on a 13-year archive of radar images with 5-min temporal and 1-km spatial resolution from the precipitation radars of the Royal Netherlands Meteorological Institute (KNMI). We aim to train the model to predict the formation and dissipation of dynamic, heavy, localized rain events, a task for which traditional Lagrangian nowcasting methods still come up short.We report on different ways to optimize predictive performance for heavy rainfall intensities through several experiments. The large dataset available provides many possible configurations for training. To focus on heavy rainfall intensities, we use different subsets of this dataset through using different conditions for event selection and varying the ratio of light and heavy precipitation events present in the training data set and change the loss function used to train the model.To assess the performance of the model, we compare our method to current state-of-the-art Lagrangian nowcasting system from the pySTEPS library, like S-PROG, a deterministic approximation of an ensemble mean forecast. The results of the experiments are used to discuss the pros and cons of machine-learning based methods for precipitation nowcasting and possible ways to further increase performance.

Download Full-text

Machine learning approaches revealed metabolic signatures of incident chronic kidney disease in persons with pre-and type 2 diabetes

10.2337/figshare.13022624.v2 ◽

2020 ◽

Author(s):

Ada Admin ◽

Jialing Huang ◽

Cornelia Huth ◽

Marcela Covic ◽

Martina Troll ◽

...

Keyword(s):

Machine Learning ◽

Type 2 Diabetes ◽

Chronic Kidney Disease ◽

Kidney Disease ◽

Characteristic Curve ◽

Predictive Performance ◽

Population Based ◽

Learning Approaches ◽

Clinical Variables

Early and precise identification of individuals with pre-diabetes and type 2 diabetes (T2D) at risk of progressing to chronic kidney disease (CKD) is essential to prevent complications of diabetes. Here, we identify and evaluate prospective metabolite biomarkers and the best set of predictors of CKD in the longitudinal, population-based Cooperative Health Research in the Region of Augsburg (KORA) cohort by targeted metabolomics and machine learning approaches. Out of 125 targeted metabolites, sphingomyelin (SM) C18:1 and phosphatidylcholine diacyl (PC aa) C38:0 were identified as candidate metabolite biomarkers of incident CKD specifically in hyperglycemic individuals followed during 6.5 years. Sets of predictors for incident CKD developed from 125 metabolites and 14 clinical variables showed highly stable performances in all three machine learning approaches and outperformed the currently established clinical algorithm for CKD. The two metabolites in combination with five clinical variables were identified as the best set of predictors and their predictive performance yielded a mean area value under the receiver operating characteristic curve of 0.857. The inclusion of metabolite variables in the clinical prediction of future CKD may thus improve the risk prediction in persons with pre- and T2D. The metabolite link with hyperglycemia-related early kidney dysfunction warrants further investigation.

Download Full-text

Using machine learning approach to predict short-term mortality risk of acute myocardial infarction after emergency admission

10.1109/bibm52615.2021.9669473 ◽

2021 ◽

Author(s):

Po-Cheng Peng ◽

Hsu-Chun Chien ◽

Prasenjit Mitra ◽

Tun-Wen Pai ◽

Chao-Hung Wang ◽

...

Keyword(s):

Machine Learning ◽

Myocardial Infarction ◽

Acute Myocardial Infarction ◽

Mortality Risk ◽

Emergency Admission ◽

Learning Approach ◽

Short Term ◽

Machine Learning Approach ◽

Short Term Mortality

Download Full-text

A Framework for Effective Application of Machine Learning to Microbiome-Based Classification Problems

mBio ◽

10.1128/mbio.00434-20 ◽

2020 ◽

Vol 11 (3) ◽

Cited By ~ 9

Author(s):

Begüm D. Topçuoğlu ◽

Nicholas A. Lesniak ◽

Mack T. Ruffin ◽

Jenna Wiens ◽

Patrick D. Schloss

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Random Forest ◽

Sequence Data ◽

Characteristic Curve ◽

Predictive Performance ◽

Model Complexity ◽

Support Vector ◽

Classification Problems ◽

Microbial Biomarkers

ABSTRACT Machine learning (ML) modeling of the human microbiome has the potential to identify microbial biomarkers and aid in the diagnosis of many diseases such as inflammatory bowel disease, diabetes, and colorectal cancer. Progress has been made toward developing ML models that predict health outcomes using bacterial abundances, but inconsistent adoption of training and evaluation methods call the validity of these models into question. Furthermore, there appears to be a preference by many researchers to favor increased model complexity over interpretability. To overcome these challenges, we trained seven models that used fecal 16S rRNA sequence data to predict the presence of colonic screen relevant neoplasias (SRNs) (n = 490 patients, 261 controls and 229 cases). We developed a reusable open-source pipeline to train, validate, and interpret ML models. To show the effect of model selection, we assessed the predictive performance, interpretability, and training time of L2-regularized logistic regression, L1- and L2-regularized support vector machines (SVM) with linear and radial basis function kernels, a decision tree, random forest, and gradient boosted trees (XGBoost). The random forest model performed best at detecting SRNs with an area under the receiver operating characteristic curve (AUROC) of 0.695 (interquartile range [IQR], 0.651 to 0.739) but was slow to train (83.2 h) and not inherently interpretable. Despite its simplicity, L2-regularized logistic regression followed random forest in predictive performance with an AUROC of 0.680 (IQR, 0.625 to 0.735), trained faster (12 min), and was inherently interpretable. Our analysis highlights the importance of choosing an ML approach based on the goal of the study, as the choice will inform expectations of performance and interpretability. IMPORTANCE Diagnosing diseases using machine learning (ML) is rapidly being adopted in microbiome studies. However, the estimated performance associated with these models is likely overoptimistic. Moreover, there is a trend toward using black box models without a discussion of the difficulty of interpreting such models when trying to identify microbial biomarkers of disease. This work represents a step toward developing more-reproducible ML practices in applying ML to microbiome research. We implement a rigorous pipeline and emphasize the importance of selecting ML models that reflect the goal of the study. These concepts are not particular to the study of human health but can also be applied to environmental microbiology studies.

Download Full-text

Machine learning model predicts short-term mortality among prehospital patients: A prospective development study from Finland

Resuscitation Plus ◽

10.1016/j.resplu.2021.100089 ◽

2021 ◽

Vol 5 ◽

pp. 100089

Author(s):

Joonas Tamminen ◽

Antti Kallonen ◽

Sanna Hoppu ◽

Jari Kalliomäki

Keyword(s):

Machine Learning ◽

Learning Model ◽

Short Term ◽

Development Study ◽

Prospective Development ◽

Machine Learning Model ◽

Short Term Mortality

Download Full-text

Novel radiomics signature predicts lymph node metastasis in T1 colorectal carcinoma.

Journal of Clinical Oncology ◽

10.1200/jco.2019.37.4_suppl.506 ◽

2019 ◽

Vol 37 (4_suppl) ◽

pp. 506-506

Author(s):

Yanlei Ma ◽

Sheng Zhang ◽

Xinxiang Li ◽

Tianye Niu

Keyword(s):

Lymph Node ◽

Colorectal Carcinoma ◽

Receiver Operating Characteristic ◽

Operating Characteristic ◽

Characteristic Curve ◽

Predictive Performance ◽

Cancer Center ◽

T1 Colorectal Carcinoma ◽

Auc Value ◽

Receiver Operating

506 Background: This study evaluates the predictive performance of radiomic features in metastasis of T1 colorectal carcinoma (CRC) to lymph nodes. Methods: A total of 10 200 CRC patients from our clinical cancer center included in this analysis. 225 eligible cases diagnosed with T1 CRC were included and divided into two groups: computed tomography (CT) image group (n = 82) and magnetic resonance image (MRI) group (n = 143) based on the preoperative image data available. A total of 548 radiomic features were extracted from each case and analyzed, and then a panel of radiomic features associated with lymph node metastases (LNM) were selected using Mann-Whitney U test. Combining these selected radiomic features and clinical data, the predictive performance for LNM was calculated using receiver operating characteristic (ROC) curves. Results: The prediction accuracy for LNM of T1 CRC could be improved to 0.88 by area under the receiver operating characteristic curve (AUC) through integration of one radiomic feature and three clinical indicators in CT group. In the group of contrast enhanced T1-weighted MRI (T1w-MRI), combination of two radiomic features and three clinical parameters present an AUC value of 0.85. In the group of T2-weighted MRI (T2w-MRI), combination of four radiomic features and five clinical characteristics identified T1 tumors with LNM with an AUC value of 0.87. Conclusions: The current study present a good predictive performance of combination of radiomic features with clinic characteristic in identifying T1 CRC with LNM, which may provide an important opportunity for us to make clinical treatment decision-making for T1 CRC patients.

Download Full-text

Short-Term Electric Load Forecasting Using Standardized Load Profile (SLP) And Support Vector Regression (SVR)

Engineering, Technology & Applied Science Research ◽

10.48084/etasr.2929 ◽

2019 ◽

Vol 9 (4) ◽

pp. 4548-4553

Author(s):

N. T. Dung ◽

N. T. Phuong

Keyword(s):

Machine Learning ◽

Electricity Market ◽

Business Strategy ◽

Learning Algorithm ◽

Load Forecasting ◽

Support Vector ◽

Forecast Errors ◽

Short Term ◽

Load Profile ◽

Short Term Forecasting

Short-term load forecasting (STLF) plays an important role in business strategy building, ensuring reliability and safe operation for any electrical system. There are many different methods used for short-term forecasts including regression models, time series, neural networks, expert systems, fuzzy logic, machine learning, and statistical algorithms. The practical requirement is to minimize forecast errors, avoid wastages, prevent shortages, and limit risks in the electricity market. This paper proposes a method of STLF by constructing a standardized load profile (SLP) based on the past electrical load data, utilizing Support Regression Vector (SVR) machine learning algorithm to improve the accuracy of short-term forecasting algorithms.

Download Full-text