Root for a Phishing Page using Machine Learning

Introduction: Preterm babies are a vulnerable population that experience significant short and long-term morbidity. Rehospitalisations constitute an important, potentially modifiable adverse event in this population. Improving the ability of clinicians to identify those patients at the greatest risk of rehospitalisation has the potential to improve outcomes and reduce costs. Machine-learning algorithms can provide potentially advantageous methods of prediction compared to conventional approaches like logistic regression.Objective: To compare two machine-learning methods (least absolute shrinkage and selection operator (LASSO) and random forest) to expert-opinion driven logistic regression modelling for predicting unplanned rehospitalisation within 30 days in a large French cohort of preterm babies.Design, Setting and Participants: This study used data derived exclusively from the population-based prospective cohort study of French preterm babies, EPIPAGE 2. Only those babies discharged home alive and whose parents completed the 1-year survey were eligible for inclusion in our study. All predictive models used a binary outcome, denoting a baby's status for an unplanned rehospitalisation within 30 days of discharge. Predictors included those quantifying clinical, treatment, maternal and socio-demographic factors. The predictive abilities of models constructed using LASSO and random forest algorithms were compared with a traditional logistic regression model. The logistic regression model comprised 10 predictors, selected by expert clinicians, while the LASSO and random forest included 75 predictors. Performance measures were derived using 10-fold cross-validation. Performance was quantified using area under the receiver operator characteristic curve, sensitivity, specificity, Tjur's coefficient of determination and calibration measures.Results: The rate of 30-day unplanned rehospitalisation in the eligible population used to construct the models was 9.1% (95% CI 8.2–10.1) (350/3,841). The random forest model demonstrated both an improved AUROC (0.65; 95% CI 0.59–0.7; p = 0.03) and specificity vs. logistic regression (AUROC 0.57; 95% CI 0.51–0.62, p = 0.04). The LASSO performed similarly (AUROC 0.59; 95% CI 0.53–0.65; p = 0.68) to logistic regression.Conclusions: Compared to an expert-specified logistic regression model, random forest offered improved prediction of 30-day unplanned rehospitalisation in preterm babies. However, all models offered relatively low levels of predictive ability, regardless of modelling method.

Download Full-text

Prediction of perinatal death using machine learning models: a birth registry-based cohort study in northern Tanzania

BMJ Open ◽

10.1136/bmjopen-2020-040132 ◽

2020 ◽

Vol 10 (10) ◽

pp. e040132

Author(s):

Innocent B Mboya ◽

Michael J Mahande ◽

Mohanad Mohammed ◽

Joseph Obure ◽

Henry G Mwambi

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Regression Model ◽

Logistic Regression Model ◽

Perinatal Death ◽

Learning Models ◽

Net Benefit ◽

Birth Registry ◽

Perinatal Deaths ◽

Machine Learning Models

ObjectiveWe aimed to determine the key predictors of perinatal deaths using machine learning models compared with the logistic regression model.DesignA secondary data analysis using the Kilimanjaro Christian Medical Centre (KCMC) Medical Birth Registry cohort from 2000 to 2015. We assessed the discriminative ability of models using the area under the receiver operating characteristics curve (AUC) and the net benefit using decision curve analysis.SettingThe KCMC is a zonal referral hospital located in Moshi Municipality, Kilimanjaro region, Northern Tanzania. The Medical Birth Registry is within the hospital grounds at the Reproductive and Child Health Centre.ParticipantsSingleton deliveries (n=42 319) with complete records from 2000 to 2015.Primary outcome measuresPerinatal death (composite of stillbirths and early neonatal deaths). These outcomes were only captured before mothers were discharged from the hospital.ResultsThe proportion of perinatal deaths was 3.7%. There were no statistically significant differences in the predictive performance of four machine learning models except for bagging, which had a significantly lower performance (AUC 0.76, 95% CI 0.74 to 0.79, p=0.006) compared with the logistic regression model (AUC 0.78, 95% CI 0.76 to 0.81). However, in the decision curve analysis, the machine learning models had a higher net benefit (ie, the correct classification of perinatal deaths considering a trade-off between false-negatives and false-positives)—over the logistic regression model across a range of threshold probability values.ConclusionsIn this cohort, there was no significant difference in the prediction of perinatal deaths between machine learning and logistic regression models, except for bagging. The machine learning models had a higher net benefit, as its predictive ability of perinatal death was considerably superior over the logistic regression model. The machine learning models, as demonstrated by our study, can be used to improve the prediction of perinatal deaths and triage for women at risk.

Download Full-text

P5710Clinical applications of machine learning for prediction of incident atrial fibrillation from the general population: a nationwide cohort study

European Heart Journal ◽

10.1093/eurheartj/ehz746.0651 ◽

2019 ◽

Vol 40 (Supplement_1) ◽

Author(s):

I.-S Kim ◽

P S Yang ◽

H T Yu ◽

T H Kim ◽

J S Uhm ◽

...

Keyword(s):

Machine Learning ◽

Atrial Fibrillation ◽

Logistic Regression ◽

General Population ◽

Regression Model ◽

National Health ◽

Logistic Regression Model ◽

Learning System ◽

Health Examination ◽

Clinical Variables

Abstract Background To evaluate the ability of machine learning algorithms to predict incident atrial fibrillation (AF) from the general population using health examination items. Methods We included 483,343 subjects who received national health examinations from the Korean National Health Insurance Service-based National Sample Cohort (NHIS-NSC). We trained deep neural network model (DNN) of a deep learning system and decision tree model (DT) of a machine learning system using clinical variables and health examination items (including age, sex, body mass index, history of heart failure, hypertension or diabetes, baseline creatinine, and smoking and alcohol intake habits) to predict incident AF using a training dataset of 341,771 subjects constructed from the NHIS-NSC database. The DNN and DT were validated using an independent test dataset of 141,572 remaining subjects. C-indices of DNN and DT for prediction of incident AF were compared with that of conventional logistic regression model. Results During 1,874,789 person·years (mean±standard-deviation age 47.7±14.4 years, 49.6% male), 3,282 subjects with incident AF were observed. In the validation dataset, 1,139 subjects with incident AF were observed. The c-indices of the DNN and DT for incident AF prediction were 0.828 [0.819–0.836] and 0.835 [0.825–0.844], and were significantly higher (p<0.01) than conventional logistic regression model (c-index=0.789 [0.784–0.794]). Conclusions Application of machine learning using simple clinical variables and health examination items was helpful to predict incident AF in the general population. Prospective study is warranted to construct an individualized precision medicine.

Download Full-text

Early Prediction of the Carbapenem Resistance Gram-negative Bacteria Carriage in Intensive Care Unit using Machine-Learning

10.21203/rs.3.rs-60222/v1 ◽

2020 ◽

Author(s):

Qiqiang Liang ◽

Qinyu Zhao ◽

Xin Xu ◽

Yu Zhou ◽

Man Huang

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Regression Model ◽

Logistic Regression Model ◽

Carbapenem Resistance ◽

Multivariate Logistic Regression Model ◽

Gram Negative Bacteria ◽

Multivariate Logistic Regression ◽

Gram Negative ◽

Better Than

Abstract Background The prevention and control of carbapenem-resistance gram-negative bacteria (CR-GNB) is the difficulty and focus for clinicians in the intensive care unit (ICU). This study construct a CR-GNB carriage prediction model in order to predict the CR-GNB incidence in one week. Methods The database is comprised of nearly 10,000 patients. the model is constructed by the multivariate logistic regression model and three machine learning algorithms. Then we choose the optimal model and verify the accuracy by daily predicted and recorded the occurrence of CR-GNB of all patients admitted for 4 months. Results There are 1385 patients with positive CR-GNB cultures and 1535 negative patients in this study. Forty-five variables have statistical significant differences. We include the 17 variables in the multivariate logistic regression model and build three machine learning models for all variables. In terms of accuracy and the area under the receiver operating characteristic (AUROC) curve, the random forest is better than XGBoost and multivariate logistic regression model, and better than decision tree model (accuracy: 84% >82%>81%>72%), (AUROC: 0.9089 > 0.8947 ≈ 0.8987 > 0.7845). In the 4-month prospective study, 81 cases were predicted to be positive in CR-GNB culture within 7 days, 146 cases were predicted to be negative, 86 cases were positive, and 120 cases were negative, with an overall accuracy of 84% and AUROC of 91.98%. Conclusions Prediction models by machine learning can predict the occurrence of CR-GNB colonization or infection within a week period, and can real-time predict and guide medical staff to identify high-risk groups more accurately.

Download Full-text

Machine learning-based prediction model for responses of bDMARDs in patients with rheumatoid arthritis and ankylosing spondylitis

Arthritis Research & Therapy ◽

10.1186/s13075-021-02635-3 ◽

2021 ◽

Vol 23 (1) ◽

Author(s):

Seulkee Lee ◽

Seonyoung Kang ◽

Yeonghee Eun ◽

Hong-Hee Won ◽

Hyungjin Kim ◽

...

Keyword(s):

Rheumatoid Arthritis ◽

Machine Learning ◽

Logistic Regression ◽

Ankylosing Spondylitis ◽

Regression Model ◽

Logistic Regression Model ◽

Prediction Performance ◽

Learning Models ◽

Independent Test ◽

Machine Learning Models

Abstract Background Few studies on rheumatoid arthritis (RA) have generated machine learning models to predict biologic disease-modifying antirheumatic drugs (bDMARDs) responses; however, these studies included insufficient analysis on important features. Moreover, machine learning is yet to be used to predict bDMARD responses in ankylosing spondylitis (AS). Thus, in this study, machine learning was used to predict such responses in RA and AS patients. Methods Data were retrieved from the Korean College of Rheumatology Biologics therapy (KOBIO) registry. The number of RA and AS patients in the training dataset were 625 and 611, respectively. We prepared independent test datasets that did not participate in any process of generating machine learning models. Baseline clinical characteristics were used as input features. Responders were defined as those who met the ACR 20% improvement response criteria (ACR20) and ASAS 20% improvement response criteria (ASAS20) in RA and AS, respectively, at the first follow-up. Multiple machine learning methods, including random forest (RF-method), were used to generate models to predict bDMARD responses, and we compared them with the logistic regression model. Results The RF-method model had superior prediction performance to logistic regression model (accuracy: 0.726 [95% confidence interval (CI): 0.725–0.730] vs. 0.689 [0.606–0.717], area under curve (AUC) of the receiver operating characteristic curve (ROC) 0.638 [0.576–0.658] vs. 0.565 [0.493–0.605], F1 score 0.841 [0.837–0.843] vs. 0.803 [0.732–0.828], AUC of the precision-recall curve 0.808 [0.763–0.829] vs. 0.754 [0.714–0.789]) with independent test datasets in patients with RA. However, machine learning and logistic regression exhibited similar prediction performance in AS patients. Furthermore, the patient self-reporting scales, which are patient global assessment of disease activity (PtGA) in RA and Bath Ankylosing Spondylitis Functional Index (BASFI) in AS, were revealed as the most important features in both diseases. Conclusions RF-method exhibited superior prediction performance for responses of bDMARDs to a conventional statistical method, i.e., logistic regression, in RA patients. In contrast, despite the comparable size of the dataset, machine learning did not outperform in AS patients. The most important features of both diseases, according to feature importance analysis were patient self-reporting scales.

Download Full-text

Prediction of all-cause mortality in coronary artery disease patients with atrial fibrillation based on machine learning models

BMC Cardiovascular Disorders ◽

10.1186/s12872-021-02314-w ◽

2021 ◽

Vol 21 (1) ◽

Author(s):

Xinyun Liu ◽

Jicheng Jiang ◽

Lili Wei ◽

Wenlu Xing ◽

Hailong Shang ◽

...

Keyword(s):

Machine Learning ◽

Atrial Fibrillation ◽

Coronary Artery Disease ◽

Logistic Regression ◽

Regression Model ◽

Logistic Regression Model ◽

Support Vector ◽

Vector Machines ◽

All Cause Mortality ◽

Artery Disease

Abstract Background Machine learning (ML) can include more diverse and more complex variables to construct models. This study aimed to develop models based on ML methods to predict the all-cause mortality in coronary artery disease (CAD) patients with atrial fibrillation (AF). Methods A total of 2037 CAD patients with AF were included in this study. Three ML methods were used, including the regularization logistic regression, random forest, and support vector machines. The fivefold cross-validation was used to evaluate model performance. The performance was quantified by calculating the area under the curve (AUC) with 95% confidence intervals (CI), sensitivity, specificity, and accuracy. Results After univariate analysis, 24 variables with statistical differences were included into the models. The AUC of regularization logistic regression model, random forest model, and support vector machines model was 0.732 (95% CI 0.649–0.816), 0.728 (95% CI 0.642–0.813), and 0.712 (95% CI 0.630–0.794), respectively. The regularization logistic regression model presented the highest AUC value (0.732 vs 0.728 vs 0.712), specificity (0.699 vs 0.663 vs 0.668), and accuracy (0.936 vs 0.935 vs 0.935) among the three models. However, no statistical differences were observed in the receiver operating characteristic (ROC) curve of the three models (all P > 0.05). Conclusion Combining the performance of all aspects of the models, the regularization logistic regression model was recommended to be used in clinical practice.

Download Full-text

Estimation of Prediction for Getting Heart Disease Using Logistic Regression Model of Machine Learning

2020 International Conference on Computer Communication and Informatics (ICCCI) ◽

10.1109/iccci48352.2020.9104210 ◽

2020 ◽

Author(s):

Montu Saw ◽

Tarun Saxena ◽

Sanjana Kaithwas ◽

Rahul Yadav ◽

Nidhi Lal

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Heart Disease ◽

Regression Model ◽

Logistic Regression Model

Download Full-text

Predicting the risk of inpatient hypoglycemia with machine learning using electronic health records

10.2337/figshare.12091953.v1 ◽

2020 ◽

Author(s):

Yue Ruan ◽

Alexis Bellot ◽

Zuzana Moysova ◽

Garry D. Tan ◽

Alistair Lumb ◽

...

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Regression Model ◽

Logistic Regression Model ◽

Prediction Models ◽

Machine Learning Algorithms ◽

Operating Characteristics ◽

Oxygen Saturation Level ◽

Electronic Health ◽

Clinically Significant

Objective We analyzed data from inpatients with diabetes admitted to a large university hospital to predict the risk of hypoglycemia through the use of machine learning algorithms. Research Design and Methods Four years of data was extracted from a hospital electronic health record system. This included laboratory and point-of-care blood glucose (BG) values to identify biochemical and clinically significant hypoglycaemic episodes (BG < 3.9 and < 2.9mmol/L respectively). We used patient demographics, administered medications, vital signs, laboratory results and procedures performed during the hospital stays to inform the model. Two iterations of the dataset included the doses of insulin administered and the past history of inpatient hypoglycaemia. Eighteen different prediction models were compared using the area under curve of the receiver operating characteristics (AUC_ROC) through a ten-fold cross validation. Results We analyzed data obtained from 17,658 inpatients with diabetes who underwent 32,758 admissions between July 2014 and August 2018. The predictive factors from the logistic regression model included people undergoing procedures, weight, type of diabetes, oxygen saturation level, use of medications (insulin, sulfonylurea, metformin) and albumin levels. The machine learning model with the best performance was the XGBoost model (AUC_ROC 0.96. This outperformed the logistic regression model which had an AUC_ROC of 0.75 for the estimation of the risk of clinically significant hypoglycaemia. Conclusions Advanced machine learning models are superior to logistic regression models in predicting the risk of hypoglycemia in inpatients with diabetes. Trials of such models should be conducted in real time to evaluate their utility to reduce inpatient hypoglycaemia.

Download Full-text

Predicting the risk of inpatient hypoglycemia with machine learning using electronic health records

10.2337/figshare.12091953 ◽

2020 ◽

Author(s):

Yue Ruan ◽

Alexis Bellot ◽

Zuzana Moysova ◽

Garry D. Tan ◽

Alistair Lumb ◽

...

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Regression Model ◽

Logistic Regression Model ◽

Prediction Models ◽

Machine Learning Algorithms ◽

Operating Characteristics ◽

Oxygen Saturation Level ◽

Electronic Health ◽

Clinically Significant

Objective We analyzed data from inpatients with diabetes admitted to a large university hospital to predict the risk of hypoglycemia through the use of machine learning algorithms. Research Design and Methods Four years of data was extracted from a hospital electronic health record system. This included laboratory and point-of-care blood glucose (BG) values to identify biochemical and clinically significant hypoglycaemic episodes (BG < 3.9 and < 2.9mmol/L respectively). We used patient demographics, administered medications, vital signs, laboratory results and procedures performed during the hospital stays to inform the model. Two iterations of the dataset included the doses of insulin administered and the past history of inpatient hypoglycaemia. Eighteen different prediction models were compared using the area under curve of the receiver operating characteristics (AUC_ROC) through a ten-fold cross validation. Results We analyzed data obtained from 17,658 inpatients with diabetes who underwent 32,758 admissions between July 2014 and August 2018. The predictive factors from the logistic regression model included people undergoing procedures, weight, type of diabetes, oxygen saturation level, use of medications (insulin, sulfonylurea, metformin) and albumin levels. The machine learning model with the best performance was the XGBoost model (AUC_ROC 0.96. This outperformed the logistic regression model which had an AUC_ROC of 0.75 for the estimation of the risk of clinically significant hypoglycaemia. Conclusions Advanced machine learning models are superior to logistic regression models in predicting the risk of hypoglycemia in inpatients with diabetes. Trials of such models should be conducted in real time to evaluate their utility to reduce inpatient hypoglycaemia.

Download Full-text