Machine Learning Outperforms Logistic Regression Analysis to Predict Next-Season NHL Player Injury: An Analysis of 2322 Players From 2007 to 2017

Background: The opportunity to quantitatively predict next-season injury risk in the National Hockey League (NHL) has become a reality with the advent of advanced computational processors and machine learning (ML) architecture. Unlike static regression analyses that provide a momentary prediction, ML algorithms are dynamic in that they are readily capable of imbibing historical data to build a framework that improves with additive data. Purpose: To (1) characterize the epidemiology of publicly reported NHL injuries from 2007 to 2017, (2) determine the validity of a machine learning model in predicting next-season injury risk for both goalies and position players, and (3) compare the performance of modern ML algorithms versus logistic regression (LR) analyses. Study Design: Descriptive epidemiology study. Methods: Professional NHL player data were compiled for the years 2007 to 2017 from 2 publicly reported databases in the absence of an official NHL-approved database. Attributes acquired from each NHL player from each professional year included age, 85 performance metrics, and injury history. A total of 5 ML algorithms were created for both position player and goalie data: random forest, K Nearest Neighbors, Naïve Bayes, XGBoost, and Top 3 Ensemble. LR was also performed for both position player and goalie data. Area under the receiver operating characteristic curve (AUC) primarily determined validation. Results: Player data were generated from 2109 position players and 213 goalies. For models predicting next-season injury risk for position players, XGBoost performed the best with an AUC of 0.948, compared with an AUC of 0.937 for LR ( P < .0001). For models predicting next-season injury risk for goalies, XGBoost had the highest AUC with 0.956, compared with an AUC of 0.947 for LR ( P < .0001). Conclusion: Advanced ML models such as XGBoost outperformed LR and demonstrated good to excellent capability of predicting whether a publicly reportable injury is likely to occur the next season.

Download Full-text

Machine Learning Accurately Predicts Next Season NHL Player Injury Before It Occurs: Validation of 10,449 Player-Years from 2007-17

Orthopaedic Journal of Sports Medicine ◽

10.1177/2325967120s00360 ◽

2020 ◽

Vol 8 (7_suppl6) ◽

pp. 2325967120S0036

Author(s):

Audrey Wright ◽

Jaret Karnuta ◽

Bryan Luu ◽

Heather Haeberle ◽

Eric Makhni ◽

...

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Injury Risk ◽

Medical Personnel ◽

Operating Characteristics ◽

K Nearest Neighbors ◽

Predictive Algorithm ◽

Preventative Intervention ◽

Machine Learning Model ◽

And Performance

Objectives: With the accumulation of big data surrounding National Hockey League (NHL) and the advent of advanced computational processors, machine learning (ML) is ideally suited to develop a predictive algorithm capable of imbibing historical data to accurately project a future player’s availability to play based on prior injury and performance. To the end of leveraging available analytics to permit data-driven injury prevention strategies and informed decisions for NHL franchises beyond static logistic regression (LR) analysis, the objective of this study of NHL players was to (1) characterize the epidemiology of publicly reported NHL injuries from 2007-17, (2) determine the validity of a machine learning model in predicting next season injury risk for both goalies and non-goalies, and (3) compare the performance of modern ML algorithms versus LR analyses. Methods: Hockey player data was compiled for the years 2007 to 2017 from two publicly reported databases in the absence of an official NHL-approved database. Attributes acquired from each NHL player from each professional year included: age, 85 player metrics, and injury history. A total of 5 ML algorithms were created for both non-goalie and goalie data; Random Forest, K-Nearest Neighbors, Naive Bayes, XGBoost, and Top 3 Ensemble. Logistic regression was also performed for both non-goalie and goalie data. Area under the receiver operating characteristics curve (AUC) primarily determined validation. Results: Player data was generated from 2,109 non-goalies and 213 goalies with an average follow-up of 4.5 years. The results are shown below in Table 1.For models predicting following season injury risk for non-goalies, XGBoost performed the best with an AUC of 0.948, compared to an AUC of 0.937 for logistic regression. For models predicting following season injury risk for goalies, XGBoost had the highest AUC with 0.956, compared to an AUC of 0.947 for LR. Conclusion: Advanced ML models such as XGBoost outperformed LR and demonstrated good to excellent capability of predicting whether a publicly reportable injury is likely to occur the next season. As more player-specific data become available, algorithm refinement may be possible to strengthen predictive insights and allow ML to offer quantitative risk management for franchises, present opportunity for targeted preventative intervention by medical personnel, and replace regression analysis as the new gold standard for predictive modeling. [Figure: see text]

Download Full-text

Machine Learning Outperforms Regression Analysis to Predict Next-Season Major League Baseball Player Injuries: Epidemiology and Validation of 13,982 Player-Years From Performance and Injury Profile Trends, 2000-2017

Orthopaedic Journal of Sports Medicine ◽

10.1177/2325967120963046 ◽

2020 ◽

Vol 8 (11) ◽

pp. 232596712096304

Author(s):

Jaret M. Karnuta ◽

Bryan C. Luu ◽

Heather S. Haeberle ◽

Paul M. Saluan ◽

Salvatore J. Frangiamore ◽

...

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Injury Risk ◽

Performance Metrics ◽

Major League Baseball ◽

Characteristic Curve ◽

Ensemble Classification ◽

Anatomic Site ◽

Predictive Algorithm ◽

Major League

Background: Machine learning (ML) allows for the development of a predictive algorithm capable of imbibing historical data on a Major League Baseball (MLB) player to accurately project the player's future availability. Purpose: To determine the validity of an ML model in predicting the next-season injury risk and anatomic injury location for both position players and pitchers in the MLB. Study Design: Descriptive epidemiology study. Methods: Using 4 online baseball databases, we compiled MLB player data, including age, performance metrics, and injury history. A total of 84 ML algorithms were developed. The output of each algorithm reported whether the player would sustain an injury the following season as well as the injury’s anatomic site. The area under the receiver operating characteristic curve (AUC) primarily determined validation. Results: Player data were generated from 1931 position players and 1245 pitchers, with a mean follow-up of 4.40 years (13,982 player-years) between the years of 2000 and 2017. Injured players spent a total of 108,656 days on the disabled list, with a mean of 34.21 total days per player. The mean AUC for predicting next-season injuries was 0.76 among position players and 0.65 among pitchers using the top 3 ensemble classification. Back injuries had the highest AUC among both position players and pitchers, at 0.73. Advanced ML models outperformed logistic regression in 13 of 14 cases. Conclusion: Advanced ML models generally outperformed logistic regression and demonstrated fair capability in predicting publicly reportable next-season injuries, including the anatomic region for position players, although not for pitchers.

Download Full-text

Machine Learning-based in-hospital Mortality Prediction Models for Patients With Acute Coronary Syndrome

10.21203/rs.3.rs-134944/v1 ◽

2020 ◽

Author(s):

Jun Ke ◽

Yiwei Chen ◽

Xiaoping Wang ◽

Zhiyong Wu ◽

qiongyao Zhang ◽

...

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Random Forest ◽

Hospital Mortality ◽

Operating Characteristic ◽

Prediction Models ◽

Characteristic Curve ◽

Multivariate Logistic Regression Analysis ◽

Hdl Cholesterol ◽

Coronary Syndrome

Abstract BackgroundThe purpose of this study is to identify the risk factors of in-hospital mortality in patients with acute coronary syndrome (ACS) and to evaluate the performance of traditional regression and machine learning prediction models.MethodsThe data of ACS patients who entered the emergency department of Fujian Provincial Hospital from January 1, 2017 to March 31, 2020 for chest pain were retrospectively collected. The study used univariate and multivariate logistic regression analysis to identify risk factors for in-hospital mortality of ACS patients. The traditional regression and machine learning algorithms were used to develop predictive models, and the sensitivity, specificity, and receiver operating characteristic curve were used to evaluate the performance of each model.ResultsA total of 7810 ACS patients were included in the study, and the in-hospital mortality rate was 1.75%. Multivariate logistic regression analysis found that age and levels of D-dimer, cardiac troponin I, N-terminal pro-B-type natriuretic peptide (NT-proBNP), lactate dehydrogenase (LDH), high-density lipoprotein (HDL) cholesterol, and calcium channel blockers were independent predictors of in-hospital mortality. The study found that the area under the receiver operating characteristic curve of the models developed by logistic regression, gradient boosting decision tree (GBDT), random forest, and support vector machine (SVM) for predicting the risk of in-hospital mortality were 0.963, 0.960, 0.963, and 0.959, respectively. Feature importance evaluation found that NT-proBNP, LDH, and HDL cholesterol were top three variables that contribute the most to the prediction performance of the GBDT model and random forest model.ConclusionsThe predictive model developed using logistic regression, GBDT, random forest, and SVM algorithms can be used to predict the risk of in-hospital death of ACS patients. Based on our findings, we recommend that clinicians focus on monitoring the changes of NT-proBNP, LDH, and HDL cholesterol, as this may improve the clinical outcomes of ACS patients.

Download Full-text

A comparison of logistic regression models with alternative machine learning methods to predict the risk of in-hospital mortality in emergency medical admissions via external validation

Health Informatics Journal ◽

10.1177/1460458218813600 ◽

2018 ◽

Vol 26 (1) ◽

pp. 34-44 ◽

Cited By ~ 1

Author(s):

Muhammad Faisal ◽

Andy Scally ◽

Robin Howes ◽

Kevin Beatson ◽

Donald Richardson ◽

...

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Hospital Mortality ◽

Receiver Operating Characteristic Curve ◽

Operating Characteristic ◽

Characteristic Curve ◽

External Validation ◽

Learning Methods ◽

Machine Learning Methods ◽

Operating Characteristic Curve

We compare the performance of logistic regression with several alternative machine learning methods to estimate the risk of death for patients following an emergency admission to hospital based on the patients’ first blood test results and physiological measurements using an external validation approach. We trained and tested each model using data from one hospital ( n = 24,696) and compared the performance of these models in data from another hospital ( n = 13,477). We used two performance measures – the calibration slope and area under the receiver operating characteristic curve. The logistic model performed reasonably well – calibration slope: 0.90, area under the receiver operating characteristic curve: 0.847 compared to the other machine learning methods. Given the complexity of choosing tuning parameters of these methods, the performance of logistic regression with transformations for in-hospital mortality prediction was competitive with the best performing alternative machine learning methods with no evidence of overfitting.

Download Full-text

Spatial Prediction of Aftershocks Triggered by a Major Earthquake: A Binary Machine Learning Perspective

ISPRS International Journal of Geo-Information ◽

10.3390/ijgi8100462 ◽

2019 ◽

Vol 8 (10) ◽

pp. 462 ◽

Cited By ~ 6

Author(s):

Sadra Karimzadeh ◽

Masashi Matsuoka ◽

Jianming Kuang ◽

Linlin Ge

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Operating Characteristic ◽

Active Faults ◽

Spatial Prediction ◽

Synthetic Aperture ◽

Support Vector ◽

Coulomb Stress Change ◽

K Nearest Neighbors ◽

Radar Imagery

Small earthquakes following a large event in the same area are typically aftershocks, which are usually less destructive than mainshocks. These aftershocks are considered mainshocks if they are larger than the previous mainshock. In this study, records of aftershocks (M > 2.5) of the Kermanshah Earthquake (M 7.3) in Iran were collected from the first second following the event to the end of September 2018. Different machine learning (ML) algorithms, including naive Bayes, k-nearest neighbors, a support vector machine, and random forests were used in conjunction with the slip distribution, Coulomb stress change on the source fault (deduced from synthetic aperture radar imagery), and orientations of neighboring active faults to predict the aftershock patterns. Seventy percent of the aftershocks were used for training based on a binary (“yes” or “no”) logic to predict locations of all aftershocks. While untested on independent datasets, receiver operating characteristic results of the same dataset indicate ML methods outperform routine Coulomb maps regarding the spatial prediction of aftershock patterns, especially when details of neighboring active faults are available. Logistic regression results, however, do not show significant differences with ML methods, as hidden information is likely better discovered using logistic regression analysis.

Download Full-text

Comparison of Machine Learning Models and Framingham Risk Score for the prediction of the presence and severity of Coronary Artery Diseases by using Gensini Score

10.21203/rs.2.12128/v1 ◽

2019 ◽

Author(s):

Yao Wang ◽

Kangjun Zhu ◽

Ya Li ◽

Liding Zhao ◽

Qingbo Lv ◽

...

Keyword(s):

Machine Learning ◽

Coronary Artery ◽

Risk Score ◽

Operating Characteristic ◽

Framingham Risk Score ◽

Characteristic Curve ◽

Gensini Score ◽

Coronary Artery Diseases ◽

K Nearest Neighbors ◽

Framingham Risk

Abstract Background: The risk prediction model for cardiovascular conditions based on the routine information isn’t established. Machine Learning (ML) models offered opportunities to build a promising and accurate prediction system for the presence and severity of Coronary Artery Diseases (CAD). Methods: In order to compare the validation of ML models to Framingham Risk Score (FRS), a total of 2608 inpatients (1669 men, 939 women; mean age 63.16 ± 10.72 years) at our hospital from January 2015 to July 2017 were extracted from electronic medical system with 29 attributes. Four different ML algorithms (Logistic Regression (LR), Random Forest (RF), k-Nearest Neighbors (KNN), Artificial Neural Networks (ANN)) were acted to build models, based on eight core risk factors and all factors respectively. The Area Under Curve (AUC) of receiver operating characteristic curve was the significant value to show the prediction power for different models. Results: According to the AUC, all of ML algorithms had a better prediction validation than FRS for the presence of CAD, specifically, FRS

Download Full-text

Combining Internal- and External-Training-Loads to Predict Non-Contact Injuries in Soccer

Applied Sciences ◽

10.3390/app10155261 ◽

2020 ◽

Vol 10 (15) ◽

pp. 5261

Author(s):

Emmanuel Vallance ◽

Nicolas Sutton-Charani ◽

Abdelhak Imoussaten ◽

Jacky Montmain ◽

Stéphane Perrey

Keyword(s):

Machine Learning ◽

External Load ◽

Injury Risk ◽

Performance Metrics ◽

Characteristic Curve ◽

Well Being ◽

Machine Learning Algorithms ◽

Internal Load ◽

Injury Prediction ◽

Non Linear

The large amount of features recorded from GPS and inertial sensors (external load) and well-being questionnaires (internal load) can be used together in a multi-dimensional non-linear machine learning based model for a better prediction of non-contact injuries. In this study we put forward the main hypothesis that the use of such models would be able to inform better about injury risks by considering the evolution of both internal and external loads over two horizons (one week and one month). Predictive models were trained with data collected by both GPS and subjective questionnaires and injury data from 40 elite male soccer players over one season. Various classification machine-learning algorithms that performed best on external and internal loads features were compared using standard performance metrics such as accuracy, precision, recall and the area under the receiver operator characteristic curve. In particular, tree-based algorithms based on non-linear models with an important interpretation aspect were privileged as they can help to understand internal and external load features impact on injury risk. For 1-week injury prediction, internal load features data were more accurate than external load features while for 1-month injury prediction, the best performances of classifiers were reached by combining internal and external load features.

Download Full-text

AN EFFICIENT MACHINE LEARNING MODEL FOR PREDICTION OF ACUTE MYOCARDIAL INFARCTION

Recent Advances in Computer Science and Communications ◽

10.2174/2666255813666200325104317 ◽

2020 ◽

Vol 13 ◽

Author(s):

Dhilsath Fathima.M ◽

S. Justin Samuel ◽

R. Hari Haran

Keyword(s):

Machine Learning ◽

Myocardial Infarction ◽

Acute Myocardial Infarction ◽

Logistic Regression ◽

Decision Tree ◽

Learning Model ◽

Training Dataset ◽

Data Set ◽

Machine Learning Model ◽

Proposed Model

Aim: This proposed work is used to develop an improved and robust machine learning model for predicting Myocardial Infarction (MI) could have substantial clinical impact. Objectives: This paper explains how to build machine learning based computer-aided analysis system for an early and accurate prediction of Myocardial Infarction (MI) which utilizes framingham heart study dataset for validation and evaluation. This proposed computer-aided analysis model will support medical professionals to predict myocardial infarction proficiently. Methods: The proposed model utilize the mean imputation to remove the missing values from the data set, then applied principal component analysis to extract the optimal features from the data set to enhance the performance of the classifiers. After PCA, the reduced features are partitioned into training dataset and testing dataset where 70% of the training dataset are given as an input to the four well-liked classifiers as support vector machine, k-nearest neighbor, logistic regression and decision tree to train the classifiers and 30% of test dataset is used to evaluate an output of machine learning model using performance metrics as confusion matrix, classifier accuracy, precision, sensitivity, F1-score, AUC-ROC curve. Results: Output of the classifiers are evaluated using performance measures and we observed that logistic regression provides high accuracy than K-NN, SVM, decision tree classifiers and PCA performs sound as a good feature extraction method to enhance the performance of proposed model. From these analyses, we conclude that logistic regression having good mean accuracy level and standard deviation accuracy compared with the other three algorithms. AUC-ROC curve of the proposed classifiers is analyzed from the output figure.4, figure.5 that logistic regression exhibits good AUC-ROC score, i.e. around 70% compared to k-NN and decision tree algorithm. Conclusion: From the result analysis, we infer that this proposed machine learning model will act as an optimal decision making system to predict the acute myocardial infarction at an early stage than an existing machine learning based prediction models and it is capable to predict the presence of an acute myocardial Infarction with human using the heart disease risk factors, in order to decide when to start lifestyle modification and medical treatment to prevent the heart disease.

Download Full-text

Machine Learning Prediction of SARS-CoV-2 Polymerase Chain Reaction Results with Routine Blood Tests

Laboratory Medicine ◽

10.1093/labmed/lmaa111 ◽

2020 ◽

Author(s):

Thomas Tschoellitsch ◽

Martin Dünser ◽

Carl Böck ◽

Karin Schwarzbauer ◽

Jens Meier

Keyword(s):

Machine Learning ◽

Polymerase Chain Reaction ◽

Characteristic Curve ◽

Cohort Analysis ◽

Rt Pcr ◽

Chain Reaction ◽

Blood Tests ◽

Routine Blood ◽

Machine Learning Model ◽

Polymerase Chain

Abstract Objective The diagnosis of COVID-19 is based on the detection of SARS-CoV-2 in respiratory secretions, blood, or stool. Currently, reverse transcription polymerase chain reaction (RT-PCR) is the most commonly used method to test for SARS-CoV-2. Methods In this retrospective cohort analysis, we evaluated whether machine learning could exclude SARS-CoV-2 infection using routinely available laboratory values. A Random Forests algorithm with 1353 unique features was trained to predict the RT-PCR results. Results Out of 12,848 patients undergoing SARS-CoV-2 testing, routine blood tests were simultaneously performed in 1528 patients. The machine learning model could predict SARS-CoV-2 test results with an accuracy of 86% and an area under the receiver operating characteristic curve of 0.90. Conclusion Machine learning methods can reliably predict a negative SARS-CoV-2 RT-PCR test result using standard blood tests.

Download Full-text

Evaluation of factors that predict the success rate of trial of labor after the cesarean section

BMC Pregnancy and Childbirth ◽

10.1186/s12884-021-04004-z ◽

2021 ◽

Vol 21 (1) ◽

Author(s):

Yang Mi ◽

Pengfei Qu ◽

Na Guo ◽

Ruimiao Bai ◽

Jiayi Gao ◽

...

Keyword(s):

Logistic Regression ◽

Cesarean Section ◽

Receiver Operating Characteristic Curve ◽

Success Rate ◽

Operating Characteristic ◽

Characteristic Curve ◽

Predictive Ability ◽

Training Set ◽

History Of ◽

Operating Characteristic Curve

Abstract Background For most women who have had a previous cesarean section, vaginal birth after cesarean section (VBAC) is a reasonable and safe choice, but which will increase the risk of adverse outcomes such as uterine rupture. In order to reduce the risk, we evaluated the factors that may affect VBAC and and established a model for predicting the success rate of trial of the labor after cesarean section (TOLAC). Methods All patients who gave birth at Northwest Women’s and Children’s Hospital from January 2016 to December 2018, had a history of cesarean section and voluntarily chose the TOLAC were recruited. Among them, 80% of the population was randomly assigned to the training set, while the remaining 20% were assigned to the external validation set. In the training set, univariate and multivariate logistic regression models were used to identify indicators related to successful TOLAC. A nomogram was constructed based on the results of multiple logistic regression analysis, and the selected variables included in the nomogram were used to predict the probability of successfully obtaining TOLAC. The area under the receiver operating characteristic curve was used to judge the predictive ability of the model. Results A total of 778 pregnant women were included in this study. Among them, 595 (76.48%) successfully underwent TOLAC, whereas 183 (23.52%) failed and switched to cesarean section. In multi-factor logistic regression, parity = 1, pre-pregnancy BMI < 24 kg/m2, cervical score ≥ 5, a history of previous vaginal delivery and neonatal birthweight < 3300 g were associated with the success of TOLAC. The area under the receiver operating characteristic curve in the prediction and validation models was 0.815 (95% CI: 0.762–0.854) and 0.730 (95% CI: 0.652–0.808), respectively, indicating that the nomogram prediction model had medium discriminative power. Conclusion The TOLAC was useful to reducing the cesarean section rate. Being primiparous, not overweight or obese, having a cervical score ≥ 5, a history of previous vaginal delivery or neonatal birthweight < 3300 g were protective indicators. In this study, the validated model had an approving predictive ability.

Download Full-text