Application of Machine Learning Techniques to Predict the Occurrence of Distraction-affected Crashes with Phone-Use Data

Distraction occurs when a driver’s attention is diverted from driving to a secondary task. The number of distraction-affected crashes has been increasing in recent years. Accurately predicting distraction-affected crashes is critical for roadway agencies to reduce distracted driving behaviors and distraction-affected crashes. Recently, more and more emerging phone-use data and machine learning techniques are available to safety researchers, and can potentially improve the prediction of distraction-affected crashes. Therefore, this study first examines if phone-use events provide essential information for distraction-affected crashes. The authors apply the machine learning technique (i.e., XGBoost) under two scenarios, with and without phone-use events, and compare their performances with two conventional statistical models: logistic regression model and mixed-effects logistic regression model. The comparison demonstrates the superiority of XGBoost over logistic regression with a high-dimensional unbalanced dataset. Further, this study implements SHAP (SHapley Additive exPlanation) to interpret the results and analyze the importance of individual features related to distraction-affected crashes and tests its ability to improve prediction accuracy. The trained XGBoost model achieves a sensitivity of 91.59%, a specificity of 85.92%, and 88.72% accuracy. The XGBoost and SHAP results suggest that: (1) phone-use information is an important factor associated with the occurrences of distraction-affected crashes; (2) distraction-affected crashes are more likely to occur on roadway segments with higher exposure (i.e., length and traffic volume), unevenness of traffic flow condition, or with medium truck volume.

Download Full-text

Factors That Can Trigger Depression

P2P E INOVAÇÃO ◽

10.21721/p2p.2021v7n2.p164-185 ◽

2021 ◽

Vol 7 (2) ◽

pp. 164-185

Author(s):

Haydée Maria Correia da Batista ◽

Andrea Borges Paim ◽

Brenda Santos Siqueira ◽

Nelson Francisco Favilla Ebecken ◽

Ana Claudia Dias

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Regression Model ◽

Logistic Regression Model ◽

Binary Logistic Regression ◽

National Health Survey ◽

Machine Learning Techniques ◽

Binary Logistic Regression Model ◽

Explanatory Variables ◽

Learning Techniques

According to data from the last National Health Survey (PNS), conducted in 2013 by the Brazilian Institute of Geography and Statistics (IBGE) in partnership with the Ministry of Health, 7.6% of people aged 18 and over received diagnosis of depression. Therefore, based on this research, the purpose of this study was to identify factors that may be relevant to a possible diagnosis of depression, using machine learning techniques. The binary logistic regression model was chosen as the machine learning technique, with progressive and regressive methods for selecting variables and a model built by the researcher, generating seven different models. The model’s performance evaluation was made by comparing some metrics such as Cox-Snell R2 and Nagelkerke R2, which presented remarkably close results. Based on these models, 37 explanatory variables were selected which were applied to a new logistic regression model. The results showed that some variables significantly increased the chance of a positive diagnosis of depression as well as some variables were indicative of a reduction in the chances of this diagnosis.

Download Full-text

Logistic Regression Model for Loan Prediction: A Machine Learning Approach

10.1109/eti4.051663.2021.9619201 ◽

2021 ◽

Author(s):

Richa Manglani ◽

Anuja Bokhare

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Regression Model ◽

Logistic Regression Model ◽

Learning Approach ◽

Machine Learning Approach

Download Full-text

Malicious URL Detection using Logistic Regression

10.36227/techrxiv.14790381 ◽

2021 ◽

Author(s):

Rohit Rayala ◽

Sashank Pasumarthi ◽

Rohith Kuppa ◽

S R KARTHIK

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Machine Learning Techniques ◽

Learning Techniques

Paper is based on a model that is built to detect malicious URLs using machine learning techniques.

Download Full-text

Machine-Learning vs. Expert-Opinion Driven Logistic Regression Modelling for Predicting 30-Day Unplanned Rehospitalisation in Preterm Babies: A Prospective, Population-Based Study (EPIPAGE 2)

Frontiers in Pediatrics ◽

10.3389/fped.2020.585868 ◽

2021 ◽

Vol 8 ◽

Author(s):

Robert A. Reed ◽

Andrei S. Morgan ◽

Jennifer Zeitlin ◽

Pierre-Henri Jarreau ◽

Héloïse Torchin ◽

...

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Random Forest ◽

Regression Model ◽

Expert Opinion ◽

Logistic Regression Model ◽

Population Based ◽

Regression Modelling ◽

Preterm Babies ◽

Logistic Regression Modelling

Introduction: Preterm babies are a vulnerable population that experience significant short and long-term morbidity. Rehospitalisations constitute an important, potentially modifiable adverse event in this population. Improving the ability of clinicians to identify those patients at the greatest risk of rehospitalisation has the potential to improve outcomes and reduce costs. Machine-learning algorithms can provide potentially advantageous methods of prediction compared to conventional approaches like logistic regression.Objective: To compare two machine-learning methods (least absolute shrinkage and selection operator (LASSO) and random forest) to expert-opinion driven logistic regression modelling for predicting unplanned rehospitalisation within 30 days in a large French cohort of preterm babies.Design, Setting and Participants: This study used data derived exclusively from the population-based prospective cohort study of French preterm babies, EPIPAGE 2. Only those babies discharged home alive and whose parents completed the 1-year survey were eligible for inclusion in our study. All predictive models used a binary outcome, denoting a baby's status for an unplanned rehospitalisation within 30 days of discharge. Predictors included those quantifying clinical, treatment, maternal and socio-demographic factors. The predictive abilities of models constructed using LASSO and random forest algorithms were compared with a traditional logistic regression model. The logistic regression model comprised 10 predictors, selected by expert clinicians, while the LASSO and random forest included 75 predictors. Performance measures were derived using 10-fold cross-validation. Performance was quantified using area under the receiver operator characteristic curve, sensitivity, specificity, Tjur's coefficient of determination and calibration measures.Results: The rate of 30-day unplanned rehospitalisation in the eligible population used to construct the models was 9.1% (95% CI 8.2–10.1) (350/3,841). The random forest model demonstrated both an improved AUROC (0.65; 95% CI 0.59–0.7; p = 0.03) and specificity vs. logistic regression (AUROC 0.57; 95% CI 0.51–0.62, p = 0.04). The LASSO performed similarly (AUROC 0.59; 95% CI 0.53–0.65; p = 0.68) to logistic regression.Conclusions: Compared to an expert-specified logistic regression model, random forest offered improved prediction of 30-day unplanned rehospitalisation in preterm babies. However, all models offered relatively low levels of predictive ability, regardless of modelling method.

Download Full-text

Roundness prediction in centreless grinding using physics-enhanced machine learning techniques

The International Journal of Advanced Manufacturing Technology ◽

10.1007/s00170-020-06407-2 ◽

2020 ◽

Author(s):

Hossein Safarzadeh ◽

Marco Leonesio ◽

Giacomo Bianchi ◽

Michele Monno

Keyword(s):

Neural Network ◽

Machine Learning ◽

Regression Model ◽

Hybrid Models ◽

Physical Models ◽

Machine Learning Techniques ◽

Support Vector ◽

Grinding Wheel ◽

Wheel Wear ◽

Learning Techniques

AbstractThis work proposes a model for suggesting optimal process configuration in plunge centreless grinding operations. Seven different approaches were implemented and compared: first principles model, neural network model with one hidden layer, support vector regression model with polynomial kernel function, Gaussian process regression model and hybrid versions of those three models. The first approach is based on an enhancement of the well-known numerical process simulation of geometrical instability. The model takes into account raw workpiece profile and possible wheel-workpiece loss of contact, which introduces an inherent limitation on the resulting profile waviness. Physical models, because of epistemic errors due to neglected or oversimplified functional relationships, can be too approximated for being considered in industrial applications. Moreover, in deterministic models, uncertainties affecting the various parameters are not explicitly considered. Complexity in centreless grinding models arises from phenomena like contact length dependency on local compliance, contact force and grinding wheel roughness, unpredicted material properties of the grinding wheel and workpiece, precision of the manual setup done by the operator, wheel wear and nature of wheel wear. In order to improve the overall model prediction accuracy and allow automated continuous learning, several machine learning techniques have been investigated: a Bayesian regularized neural network, an SVR model and a GPR model. To exploit the a priori knowledge embedded in physical models, hybrid models are proposed, where neural network, SVR and GPR models are fed by the nominal process parameters enriched with the roundness predicted by the first principle model. Those hybrid models result in an improved prediction capability.

Download Full-text

Machine Learning Techniques Applied to Profile Mobile Banking Users in India

International Journal of Information Systems in the Service Sector ◽

10.4018/jisss.2013010105 ◽

2013 ◽

Vol 5 (1) ◽

pp. 82-92 ◽

Cited By ~ 8

Author(s):

M. Carr ◽

V. Ravi ◽

G. Sridharan Reddy ◽

D. Veranna

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Decision Tree ◽

Decision Trees ◽

Multilayer Perceptron ◽

Machine Learning Techniques ◽

Mobile Banking ◽

Classification Rules ◽

Learning Techniques ◽

Potential Customers

This paper profiles mobile banking users using machine learning techniques viz. Decision Tree, Logistic Regression, Multilayer Perceptron, and SVM to test a research model with fourteen independent variables and a dependent variable (adoption). A survey was conducted and the results were analysed using these techniques. Using Decision Trees the profile of the mobile banking adopter’s profile was identified. Comparing different machine learning techniques it was found that Decision Trees outperformed the Logistic Regression and Multilayer Perceptron and SVM. Out of all the techniques, Decision Tree is recommended for profiling studies because apart from obtaining high accurate results, it also yields ‘if–then’ classification rules. The classification rules provided here can be used to target potential customers to adopt mobile banking by offering them appropriate incentives.

Download Full-text

Prediction of perinatal death using machine learning models: a birth registry-based cohort study in northern Tanzania

BMJ Open ◽

10.1136/bmjopen-2020-040132 ◽

2020 ◽

Vol 10 (10) ◽

pp. e040132

Author(s):

Innocent B Mboya ◽

Michael J Mahande ◽

Mohanad Mohammed ◽

Joseph Obure ◽

Henry G Mwambi

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Regression Model ◽

Logistic Regression Model ◽

Perinatal Death ◽

Learning Models ◽

Net Benefit ◽

Birth Registry ◽

Perinatal Deaths ◽

Machine Learning Models

ObjectiveWe aimed to determine the key predictors of perinatal deaths using machine learning models compared with the logistic regression model.DesignA secondary data analysis using the Kilimanjaro Christian Medical Centre (KCMC) Medical Birth Registry cohort from 2000 to 2015. We assessed the discriminative ability of models using the area under the receiver operating characteristics curve (AUC) and the net benefit using decision curve analysis.SettingThe KCMC is a zonal referral hospital located in Moshi Municipality, Kilimanjaro region, Northern Tanzania. The Medical Birth Registry is within the hospital grounds at the Reproductive and Child Health Centre.ParticipantsSingleton deliveries (n=42 319) with complete records from 2000 to 2015.Primary outcome measuresPerinatal death (composite of stillbirths and early neonatal deaths). These outcomes were only captured before mothers were discharged from the hospital.ResultsThe proportion of perinatal deaths was 3.7%. There were no statistically significant differences in the predictive performance of four machine learning models except for bagging, which had a significantly lower performance (AUC 0.76, 95% CI 0.74 to 0.79, p=0.006) compared with the logistic regression model (AUC 0.78, 95% CI 0.76 to 0.81). However, in the decision curve analysis, the machine learning models had a higher net benefit (ie, the correct classification of perinatal deaths considering a trade-off between false-negatives and false-positives)—over the logistic regression model across a range of threshold probability values.ConclusionsIn this cohort, there was no significant difference in the prediction of perinatal deaths between machine learning and logistic regression models, except for bagging. The machine learning models had a higher net benefit, as its predictive ability of perinatal death was considerably superior over the logistic regression model. The machine learning models, as demonstrated by our study, can be used to improve the prediction of perinatal deaths and triage for women at risk.

Download Full-text

P5710Clinical applications of machine learning for prediction of incident atrial fibrillation from the general population: a nationwide cohort study

European Heart Journal ◽

10.1093/eurheartj/ehz746.0651 ◽

2019 ◽

Vol 40 (Supplement_1) ◽

Author(s):

I.-S Kim ◽

P S Yang ◽

H T Yu ◽

T H Kim ◽

J S Uhm ◽

...

Keyword(s):

Machine Learning ◽

Atrial Fibrillation ◽

Logistic Regression ◽

General Population ◽

Regression Model ◽

National Health ◽

Logistic Regression Model ◽

Learning System ◽

Health Examination ◽

Clinical Variables

Abstract Background To evaluate the ability of machine learning algorithms to predict incident atrial fibrillation (AF) from the general population using health examination items. Methods We included 483,343 subjects who received national health examinations from the Korean National Health Insurance Service-based National Sample Cohort (NHIS-NSC). We trained deep neural network model (DNN) of a deep learning system and decision tree model (DT) of a machine learning system using clinical variables and health examination items (including age, sex, body mass index, history of heart failure, hypertension or diabetes, baseline creatinine, and smoking and alcohol intake habits) to predict incident AF using a training dataset of 341,771 subjects constructed from the NHIS-NSC database. The DNN and DT were validated using an independent test dataset of 141,572 remaining subjects. C-indices of DNN and DT for prediction of incident AF were compared with that of conventional logistic regression model. Results During 1,874,789 person·years (mean±standard-deviation age 47.7±14.4 years, 49.6% male), 3,282 subjects with incident AF were observed. In the validation dataset, 1,139 subjects with incident AF were observed. The c-indices of the DNN and DT for incident AF prediction were 0.828 [0.819–0.836] and 0.835 [0.825–0.844], and were significantly higher (p<0.01) than conventional logistic regression model (c-index=0.789 [0.784–0.794]). Conclusions Application of machine learning using simple clinical variables and health examination items was helpful to predict incident AF in the general population. Prospective study is warranted to construct an individualized precision medicine.

Download Full-text

Machine learning versus logistic regression methods for 2-year mortality prognostication in a small, heterogeneous glioma database

10.1101/472555 ◽

2018 ◽

Cited By ~ 2

Author(s):

Sandip S Panesar ◽

Rhett N D’Souza ◽

Fang-Cheng Yeh ◽

Juan C Fernandez-Miranda

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Machine Learning Techniques ◽

World Health ◽

Support Vector ◽

Molecular Characteristics ◽

Regression Methods ◽

Learning Techniques ◽

The World ◽

Health Organization

AbstractBackgroundMachine learning (ML) is the application of specialized algorithms to datasets for trend delineation, categorization or prediction. ML techniques have been traditionally applied to large, highly-dimensional databases. Gliomas are a heterogeneous group of primary brain tumors, traditionally graded using histopathological features. Recently the World Health Organization proposed a novel grading system for gliomas incorporating molecular characteristics. We aimed to study whether ML could achieve accurate prognostication of 2-year mortality in a small, highly-dimensional database of glioma patients.MethodsWe applied three machine learning techniques: artificial neural networks (ANN), decision trees (DT), support vector machine (SVM), and classical logistic regression (LR) to a dataset consisting of 76 glioma patients of all grades. We compared the effect of applying the algorithms to the raw database, versus a database where only statistically significant features were included into the algorithmic inputs (feature selection).ResultsRaw input consisted of 21 variables, and achieved performance of (accuracy/AUC): 70.7%/0.70 for ANN, 68%/0.72 for SVM, 66.7%/0.64 for LR and 65%/0.70 for DT. Feature selected input consisted of 14 variables and achieved performance of 73.4%/0.75 for ANN, 73.3%/0.74 for SVM, 69.3%/0.73 for LR and 65.2%/0.63 for DT.ConclusionsWe demonstrate that these techniques can also be applied to small, yet highly-dimensional datasets. Our ML techniques achieved reasonable performance compared to similar studies in the literature. Though local databases may be small versus larger cancer repositories, we demonstrate that ML techniques can still be applied to their analysis, though traditional statistical methods are of similar benefit.

Download Full-text

Early Prediction of the Carbapenem Resistance Gram-negative Bacteria Carriage in Intensive Care Unit using Machine-Learning

10.21203/rs.3.rs-60222/v1 ◽

2020 ◽

Author(s):

Qiqiang Liang ◽

Qinyu Zhao ◽

Xin Xu ◽

Yu Zhou ◽

Man Huang

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Regression Model ◽

Logistic Regression Model ◽

Carbapenem Resistance ◽

Multivariate Logistic Regression Model ◽

Gram Negative Bacteria ◽

Multivariate Logistic Regression ◽

Gram Negative ◽

Better Than

Abstract Background The prevention and control of carbapenem-resistance gram-negative bacteria (CR-GNB) is the difficulty and focus for clinicians in the intensive care unit (ICU). This study construct a CR-GNB carriage prediction model in order to predict the CR-GNB incidence in one week. Methods The database is comprised of nearly 10,000 patients. the model is constructed by the multivariate logistic regression model and three machine learning algorithms. Then we choose the optimal model and verify the accuracy by daily predicted and recorded the occurrence of CR-GNB of all patients admitted for 4 months. Results There are 1385 patients with positive CR-GNB cultures and 1535 negative patients in this study. Forty-five variables have statistical significant differences. We include the 17 variables in the multivariate logistic regression model and build three machine learning models for all variables. In terms of accuracy and the area under the receiver operating characteristic (AUROC) curve, the random forest is better than XGBoost and multivariate logistic regression model, and better than decision tree model (accuracy: 84% >82%>81%>72%), (AUROC: 0.9089 > 0.8947 ≈ 0.8987 > 0.7845). In the 4-month prospective study, 81 cases were predicted to be positive in CR-GNB culture within 7 days, 146 cases were predicted to be negative, 86 cases were positive, and 120 cases were negative, with an overall accuracy of 84% and AUROC of 91.98%. Conclusions Prediction models by machine learning can predict the occurrence of CR-GNB colonization or infection within a week period, and can real-time predict and guide medical staff to identify high-risk groups more accurately.

Download Full-text