Evaluating Machine-Learning Models for Predicting Hospital Transfers in Administrative Data: A Study of Admissions for Myocardial Infarction

IntroductionHospital administrative data is a valuable source to measure myocardial infarction (MI) rates. However, admission counts are susceptible to over-inflation if the patient is transferred multiple times during a single episode of care, and variables denoting transfers may not be reliable. To obtain an accurate number of events, hospital transfers need to be correctly identified. Objectives and ApproachWe assessed multivariable logistic regression and various machine-learning models to predict transfers in hospital administrative data. Using Western Australian linked hospital data, we identified records from 2000-2016 with a principal discharge diagnosis of MI. Our standard method to compare against was a 24-hour look-back to identify a transfer using just admission and separation dates from the current and previous records for the same patient. Multivariable logistic regression and decision trees with various boosting algorithms were used to predict if a single record was a transfer, using variables recorded in the admission (e.g. age, sex, type of hospital, admitted from, emergency/elective admission). The performance of each model was calculated using metrics including area under the curve (AUC). ResultsRecords in the training, validation and testing samples had similar characteristics: mean age=68.9 years, 66% were male and 58% admitted to tertiary hospitals. Gradient Boosting Decision Tree (AUC=0.887, 95%CI: 0.886-0.887) outperformed multivariable logistic regression (AUC=0.875; 95% CI: 0.869-0.881) and random forest models (AUC=0.859; 95% CI: 0.853-0.865). Conclusion / ImplicationsMultivariable logistic regression and machine-learning models are able to identify transfers in a single record from existing variables. They can be used in unlinked hospital administrative data where records belonging to the same patient cannot be identified.

Download Full-text

Machine Learning Models of Acute Kidney Injury Prediction in Acute Pancreatitis Patients

Gastroenterology Research and Practice ◽

10.1155/2020/3431290 ◽

2020 ◽

Vol 2020 ◽

pp. 1-8

Author(s):

Cheng Qu ◽

Lin Gao ◽

Xian-qiang Yu ◽

Mei Wei ◽

Guo-quan Fang ◽

...

Keyword(s):

Machine Learning ◽

Acute Kidney Injury ◽

Acute Pancreatitis ◽

Logistic Regression ◽

Kidney Injury ◽

Gradient Boosting ◽

Support Vector ◽

Learning Models ◽

Extreme Gradient Boosting ◽

Machine Learning Models

Background. Acute kidney injury (AKI) has long been recognized as a common and important complication of acute pancreatitis (AP). In the study, machine learning (ML) techniques were used to establish predictive models for AKI in AP patients during hospitalization. This is a retrospective review of prospectively collected data of AP patients admitted within one week after the onset of abdominal pain to our department from January 2014 to January 2019. Eighty patients developed AKI after admission (AKI group) and 254 patients did not (non-AKI group) in the hospital. With the provision of additional information such as demographic characteristics or laboratory data, support vector machine (SVM), random forest (RF), classification and regression tree (CART), and extreme gradient boosting (XGBoost) were used to build models of AKI prediction and compared to the predictive performance of the classic model using logistic regression (LR). XGBoost performed best in predicting AKI with an AUC of 91.93% among the machine learning models. The AUC of logistic regression analysis was 87.28%. Present findings suggest that compared to the classical logistic regression model, machine learning models using features that can be easily obtained at admission had a better performance in predicting AKI in the AP patients.

Download Full-text

Dangerous Prediction in Roads by Using Machine Learning Models

Ingénierie des systèmes d information ◽

10.18280/isi.250511 ◽

2020 ◽

Vol 25 (5) ◽

pp. 637-644

Author(s):

Shiva Prasad Satla ◽

Manchala Sadanandam ◽

Buradagunta Suvarna

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

New Technologies ◽

Gradient Boosting ◽

Learning Models ◽

Decision Tree Classifier ◽

The People ◽

Tree Classifier ◽

Nearest Neighbour Classifier ◽

Machine Learning Models

Many vulnerable, heinous acts that are coming about in the society especially at Roads, most specifically affecting women in the society, are more in recent days. Though new technologies are developing day by day, the fatality rate is not in control to date. Without proper guidance to the people about the particular place where there is a big scope of occurrence of a greater number of accidents, this menace cannot be regulated. It is required to highlight the District-wise data and Roads where the accidents and fatalities are more. The data would help the policymakers to put in place Focused Initiatives regarding those top dangerous roads to address the menace of rising road accidents and resultant fatalities. In this, we created a dataset in Andhra Pradesh where we include those attributes that are helpful for our analysis to predict which road is the most dangerous one. We applied various Machine Learning models such as Logistic regression, Random forest classifier, Gradient Boosting Classifier, Gaussian Naive Bayes, Decision Tree Classifier, K- Nearest Neighbour Classifier and SVM to predict the dangerous roads. It is observed that Logistic Regression provides good accuracy with 87.14.

Download Full-text

Prediction of Post-Intubation Tachycardia Using Machine-Learning Models

Applied Sciences ◽

10.3390/app10031151 ◽

2020 ◽

Vol 10 (3) ◽

pp. 1151

Author(s):

Hanna Kim ◽

Young-Seob Jeong ◽

Ah Reum Kang ◽

Woohyun Jung ◽

Yang Hoon Chung ◽

...

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Characteristic Curve ◽

Gradient Boosting ◽

Support Vector ◽

Learning Models ◽

Feature Sets ◽

Extreme Gradient Boosting ◽

Post Intubation ◽

Machine Learning Models

Tachycardia is defined as a heart rate greater than 100 bpm for more than 1 min. Tachycardia often occurs after endotracheal intubation and can cause serious complication in patients with cardiovascular disease. The ability to predict post-intubation tachycardia would help clinicians by notifying a potential event to pre-treat. In this paper, we predict the potential post-intubation tachycardia. Given electronic medical record and vital signs collected before tracheal intubation, we predict whether post-intubation tachycardia will occur within 10 min. Of 1931 available patient datasets, 257 remained after filtering those with inappropriate data such as outliers and inappropriate annotations. Three feature sets were designed using feature selection algorithms, and two additional feature sets were defined by statistical inspection or manual examination. The five feature sets were compared with various machine learning models such as naïve Bayes classifiers, logistic regression, random forest, support vector machines, extreme gradient boosting, and artificial neural networks. Parameters of the models were optimized for each feature set. By 10-fold cross validation, we found that an logistic regression model with eight-dimensional hand-crafted features achieved an accuracy of 80.5%, recall of 85.1%, precision of 79.9%, an F1 score of 79.9%, and an area under the receiver operating characteristic curve of 0.85.

Download Full-text

Machine learning-based preoperative datamining can predict the therapeutic outcome of sleep surgery in OSA subjects

Scientific Reports ◽

10.1038/s41598-021-94454-4 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Jin Youp Kim ◽

Hyoun-Joong Kong ◽

Su Hwan Kim ◽

Sangjun Lee ◽

Seung Heon Kang ◽

...

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Sleep Apnea ◽

Success Rate ◽

Surgical Outcome ◽

Successful Outcome ◽

Gradient Boosting ◽

Learning Models ◽

Sleep Surgery ◽

Machine Learning Models

AbstractIncreasing recognition of anatomical obstruction has resulted in a large variety of sleep surgeries to improve anatomic collapse of obstructive sleep apnea (OSA) and the prediction of whether sleep surgery will have successful outcome is very important. The aim of this study is to assess a machine learning-based clinical model that predict the success rate of sleep surgery in OSA subjects. The predicted success rate from machine learning and the predicted subjective surgical outcome from the physician were compared with the actual success rate in 163 male dominated-OSA subjects. Predicted success rate of sleep surgery from machine learning models based on sleep parameters and endoscopic findings of upper airway demonstrated higher accuracy than subjective predicted value of sleep surgeon. The gradient boosting model showed the best performance to predict the surgical success that is evaluated by pre- and post-operative polysomnography or home sleep apnea testing among the logistic regression and three machine learning models, and the accuracy of gradient boosting model (0.708) was significantly higher than logistic regression model (0.542). Our data demonstrate that the data mining-driven prediction such as gradient boosting exhibited higher accuracy for prediction of surgical outcome and we can provide accurate information on surgical outcomes before surgery to OSA subjects using machine learning models.

Download Full-text

Machine learning models to identify low adherence to influenza vaccination among Korean adults with cardiovascular disease

BMC Cardiovascular Disorders ◽

10.1186/s12872-021-01925-7 ◽

2021 ◽

Vol 21 (1) ◽

Author(s):

Moojung Kim ◽

Young Jae Kim ◽

Sung Jin Park ◽

Kwang Gi Kim ◽

Pyung Chun Oh ◽

...

Keyword(s):

Machine Learning ◽

Cardiovascular Disease ◽

Influenza Vaccination ◽

Machine Learning Techniques ◽

Gradient Boosting ◽

Support Vector ◽

Age Group ◽

Learning Models ◽

Extreme Gradient Boosting ◽

Machine Learning Models

Abstract Background Annual influenza vaccination is an important public health measure to prevent influenza infections and is strongly recommended for cardiovascular disease (CVD) patients, especially in the current coronavirus disease 2019 (COVID-19) pandemic. The aim of this study is to develop a machine learning model to identify Korean adult CVD patients with low adherence to influenza vaccination Methods Adults with CVD (n = 815) from a nationally representative dataset of the Fifth Korea National Health and Nutrition Examination Survey (KNHANES V) were analyzed. Among these adults, 500 (61.4%) had answered "yes" to whether they had received seasonal influenza vaccinations in the past 12 months. The classification process was performed using the logistic regression (LR), random forest (RF), support vector machine (SVM), and extreme gradient boosting (XGB) machine learning techniques. Because the Ministry of Health and Welfare in Korea offers free influenza immunization for the elderly, separate models were developed for the < 65 and ≥ 65 age groups. Results The accuracy of machine learning models using 16 variables as predictors of low influenza vaccination adherence was compared; for the ≥ 65 age group, XGB (84.7%) and RF (84.7%) have the best accuracies, followed by LR (82.7%) and SVM (77.6%). For the < 65 age group, SVM has the best accuracy (68.4%), followed by RF (64.9%), LR (63.2%), and XGB (61.4%). Conclusions The machine leaning models show comparable performance in classifying adult CVD patients with low adherence to influenza vaccination.

Download Full-text

High performance logistic regression for privacy-preserving genome analysis

BMC Medical Genomics ◽

10.1186/s12920-020-00869-9 ◽

2021 ◽

Vol 14 (1) ◽

Author(s):

Martine De Cock ◽

Rafael Dowsley ◽

Anderson C. A. Nascimento ◽

Davis Railsback ◽

Jianwei Shen ◽

...

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Genome Analysis ◽

Local Area Network ◽

Local Area ◽

Activation Function ◽

Area Network ◽

Learning Models ◽

Data Set ◽

Machine Learning Models

Abstract Background In biomedical applications, valuable data is often split between owners who cannot openly share the data because of privacy regulations and concerns. Training machine learning models on the joint data without violating privacy is a major technology challenge that can be addressed by combining techniques from machine learning and cryptography. When collaboratively training machine learning models with the cryptographic technique named secure multi-party computation, the price paid for keeping the data of the owners private is an increase in computational cost and runtime. A careful choice of machine learning techniques, algorithmic and implementation optimizations are a necessity to enable practical secure machine learning over distributed data sets. Such optimizations can be tailored to the kind of data and Machine Learning problem at hand. Methods Our setup involves secure two-party computation protocols, along with a trusted initializer that distributes correlated randomness to the two computing parties. We use a gradient descent based algorithm for training a logistic regression like model with a clipped ReLu activation function, and we break down the algorithm into corresponding cryptographic protocols. Our main contributions are a new protocol for computing the activation function that requires neither secure comparison protocols nor Yao’s garbled circuits, and a series of cryptographic engineering optimizations to improve the performance. Results For our largest gene expression data set, we train a model that requires over 7 billion secure multiplications; the training completes in about 26.90 s in a local area network. The implementation in this work is a further optimized version of the implementation with which we won first place in Track 4 of the iDASH 2019 secure genome analysis competition. Conclusions In this paper, we present a secure logistic regression training protocol and its implementation, with a new subprotocol to securely compute the activation function. To the best of our knowledge, we present the fastest existing secure multi-party computation implementation for training logistic regression models on high dimensional genome data distributed across a local area network.

Download Full-text

Comparison of the Performance of Machine Learning Algorithms in Predicting Heart Disease

Frontiers in Health Informatics ◽

10.30699/fhi.v10i1.349 ◽

2021 ◽

Vol 10 (1) ◽

pp. 99

Author(s):

Sajad Yousefi

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Heart Disease ◽

Decision Tree ◽

Roc Curve ◽

Machine Learning Algorithms ◽

Supervised Machine Learning ◽

Learning Models ◽

Algorithm Performance ◽

Machine Learning Models

Introduction: Heart disease is often associated with conditions such as clogged arteries due to the sediment accumulation which causes chest pain and heart attack. Many people die due to the heart disease annually. Most countries have a shortage of cardiovascular specialists and thus, a significant percentage of misdiagnosis occurs. Hence, predicting this disease is a serious issue. Using machine learning models performed on multidimensional dataset, this article aims to find the most efficient and accurate machine learning models for disease prediction.Material and Methods: Several algorithms were utilized to predict heart disease among which Decision Tree, Random Forest and KNN supervised machine learning are highly mentioned. The algorithms are applied to the dataset taken from the UCI repository including 294 samples. The dataset includes heart disease features. To enhance the algorithm performance, these features are analyzed, the feature importance scores and cross validation are considered.Results: The algorithm performance is compared with each other, so that performance based on ROC curve and some criteria such as accuracy, precision, sensitivity and F1 score were evaluated for each model. As a result of evaluation, Accuracy, AUC ROC are 83% and 99% respectively for Decision Tree algorithm. Logistic Regression algorithm with accuracy and AUC ROC are 88% and 91% respectively has better performance than other algorithms. Therefore, these techniques can be useful for physicians to predict heart disease patients and prescribe them correctly.Conclusion: Machine learning technique can be used in medicine for analyzing the related data collections to a disease and its prediction. The area under the ROC curve and evaluating criteria related to a number of classifying algorithms of machine learning to evaluate heart disease and indeed, the prediction of heart disease is compared to determine the most appropriate classification. As a result of evaluation, better performance was observed in both Decision Tree and Logistic Regression models.

Download Full-text

Predicting Electric Vehicle Charging Station Availability Using Ensemble Machine Learning

Energies ◽

10.3390/en14237834 ◽

2021 ◽

Vol 14 (23) ◽

pp. 7834

Author(s):

Christopher Hecht ◽

Jan Figgener ◽

Dirk Uwe Sauer

Keyword(s):

Machine Learning ◽

Binary Data ◽

Training Data ◽

Gradient Boosting ◽

Traffic Density ◽

Learning Models ◽

Charging Infrastructure ◽

Ensemble Models ◽

Charging Station ◽

Machine Learning Models

Electric vehicles may reduce greenhouse gas emissions from individual mobility. Due to the long charging times, accurate planning is necessary, for which the availability of charging infrastructure must be known. In this paper, we show how the occupation status of charging infrastructure can be predicted for the next day using machine learning models— Gradient Boosting Classifier and Random Forest Classifier. Since both are ensemble models, binary training data (occupied vs. available) can be used to provide a certainty measure for predictions. The prediction may be used to adapt prices in a high-load scenario, predict grid stress, or forecast available power for smart or bidirectional charging. The models were chosen based on an evaluation of 13 different, typically used machine learning models. We show that it is necessary to know past charging station usage in order to predict future usage. Other features such as traffic density or weather have a limited effect. We show that a Gradient Boosting Classifier achieves 94.8% accuracy and a Matthews correlation coefficient of 0.838, making ensemble models a suitable tool. We further demonstrate how a model trained on binary data can perform non-binary predictions to give predictions in the categories “low likelihood” to “high likelihood”.

Download Full-text

Estimation of Chlorophyll-a Concentrations in Small Water Bodies: Comparison of Fused Gaofen-6 and Sentinel-2 Sensors

Remote Sensing ◽

10.3390/rs14010229 ◽

2022 ◽

Vol 14 (1) ◽

pp. 229

Author(s):

Jiarui Shi ◽

Qian Shen ◽

Yue Yao ◽

Junsheng Li ◽

Fu Chen ◽

...

Keyword(s):

Machine Learning ◽

Chlorophyll A ◽

Water Bodies ◽

Gradient Boosting ◽

Learning Models ◽

Small Water ◽

Extreme Gradient Boosting ◽

Machine Learning Models ◽

Sentinel 2 ◽

Small Water Bodies

Chlorophyll-a concentrations in water bodies are one of the most important environmental evaluation indicators in monitoring the water environment. Small water bodies include headwater streams, springs, ditches, flushes, small lakes, and ponds, which represent important freshwater resources. However, the relatively narrow and fragmented nature of small water bodies makes it difficult to monitor chlorophyll-a via medium-resolution remote sensing. In the present study, we first fused Gaofen-6 (a new Chinese satellite) images to obtain 2 m resolution images with 8 bands, which was approved as a good data source for Chlorophyll-a monitoring in small water bodies as Sentinel-2. Further, we compared five semi-empirical and four machine learning models to estimate chlorophyll-a concentrations via simulated reflectance using fused Gaofen-6 and Sentinel-2 spectral response function. The results showed that the extreme gradient boosting tree model (one of the machine learning models) is the most accurate. The mean relative error (MRE) was 9.03%, and the root-mean-square error (RMSE) was 4.5 mg/m3 for the Sentinel-2 sensor, while for the fused Gaofen-6 image, MRE was 6.73%, and RMSE was 3.26 mg/m3. Thus, both fused Gaofen-6 and Sentinel-2 could estimate the chlorophyll-a concentrations in small water bodies. Since the fused Gaofen-6 exhibited a higher spatial resolution and Sentinel-2 exhibited a higher temporal resolution.

Download Full-text

Benchmarking of Machine Learning Models to Assist the Prognosis of Tuberculosis

10.20944/preprints202103.0284.v2 ◽

2021 ◽

Author(s):

Maicon Herverton Lino Ferreira da Silva Barros ◽

Geovanne Oliveira Alves ◽

Lubnnia Morais Florêncio Souza ◽

Élisson da Silva Rocha ◽

João Fausto Lorenzato de Oliveira ◽

...

Keyword(s):

Machine Learning ◽

Clinical Symptoms ◽

Treatment Decision ◽

Gradient Boosting ◽

Original Form ◽

Learning Models ◽

Data Set ◽

Risk Of Death ◽

Increased Risk ◽

Machine Learning Models

Tuberculosis (TB) is an airborne infectious disease caused by organisms in the Mycobacterium tuberculosis (Mtb) complex. In many low and middle-income countries, TB remains a major cause of morbidity and mortality. Once a patient has been diagnosed with TB, it is critical that healthcare workers make the most appropriate treatment decision given the individual conditions of the patient and the likely course of the disease based on medical experience. Depending on the prognosis, delayed or inappropriate treatment can result in unsatisfactory results including the exacerbation of clinical symptoms, poor quality of life, and increased risk of death. This work benchmarks machine learning models to aid TB prognosis using a Brazilian health database of confirmed cases and deaths related to TB in the State of Amazonas. The goal is to predict the probability of death by TB thus aiding the prognosis of TB and associated treatment decision making process. In its original form, the data set comprised 36,228 records and 130 fields but suffered from missing, incomplete, or incorrect data. Following data cleaning and preprocessing, a revised data set was generated comprising 24,015 records and 38 fields, including 22,876 reported cured TB patients and 1,139 deaths by TB. To explore how the data imbalance impacts model performance, two controlled experiments were designed using (1) imbalanced and (2) balanced data sets. The best result is achieved by the Gradient Boosting (GB) model using the balanced data set to predict TB-mortality, and the ensemble model composed by the Random Forest (RF), GB and Multi-layer Perceptron (MLP) models is the best model to predict the cure class.

Download Full-text