Abstract 1122‐000047: Machine Learning to Predict Stroke Outcomes after Mechanical Thrombectomy

Introduction : Prognostication is an integral part of clinical decision‐making in stroke care. Machine learning (ML) methods have gained increasing popularity in the medical field due to their flexibility and high performance. Using a large comprehensive stroke center registry, we sought to apply various ML techniques for 90‐day stroke outcome predictions after thrombectomy. Methods : We used individual patient data from our prospectively collected thrombectomy database between 09/2010 and 03/2020. Patients with anterior circulation strokes (Internal Carotid Artery, Middle Cerebral Artery M1, M2, or M3 segments and Anterior Cerebral Artery) and complete records were included. Our primary outcome was 90‐day functional independence (defined as modified Rankin Scale score 0–2). Pre‐ and post‐procedure models were developed. Four known ML algorithms (support vector machine, random forest, gradient boosting, and artificial neural network) were implemented using a 70/30 training‐test data split and 10‐fold cross‐validation on the training data for model calibration. Discriminative performance was evaluated using the area under the receiver operator characteristics curve (AUC) metric. Results : Among 1248 patients with anterior circulation large vessel occlusion stroke undergoing thrombectomy during the study period, 1020 had complete records and were included in the analysis. In the training data (n = 714), 49.3% of the patients achieved independence at 90‐days. Fifteen baseline clinical, laboratory and neuroimaging features were used to develop the pre‐procedural models, with four additional parameters included in the post‐procedure models. For the preprocedural models, the highest AUC was 0.797 (95%CI [0.75‐ 0.85]) for the gradient boosting model. Similarly, the same ML technique performed best on post‐procedural data and had an improved discriminative performance compared to the pre‐procedure model with an AUC of 0.82 (95%CI [0.77‐ 0.87]). Conclusions : Our pre‐and post‐procedural models reliably estimated outcomes in stroke patients undergoing thrombectomy. They represent a step forward in creating simple and efficient prognostication tools to aid treatment decision‐making. A web‐based platform and related mobile app are underway.

Download Full-text

Exploiting Rules to Enhance Machine Learning in Extracting Information From Multi-Institutional Prostate Pathology Reports

JCO Clinical Cancer Informatics ◽

10.1200/cci.20.00028 ◽

2020 ◽

pp. 865-874

Author(s):

Enrico Santus ◽

Tal Schuster ◽

Amir M. Tahmasebi ◽

Clara Li ◽

Adam Yala ◽

...

Keyword(s):

Machine Learning ◽

Hybrid Systems ◽

High Performance ◽

Feature Model ◽

Training Data ◽

Gradient Boosting ◽

Support Vector ◽

Data Set ◽

Extreme Gradient Boosting ◽

Pathology Reports

PURPOSE Literature on clinical note mining has highlighted the superiority of machine learning (ML) over hand-crafted rules. Nevertheless, most studies assume the availability of large training sets, which is rarely the case. For this reason, in the clinical setting, rules are still common. We suggest 2 methods to leverage the knowledge encoded in pre-existing rules to inform ML decisions and obtain high performance, even with scarce annotations. METHODS We collected 501 prostate pathology reports from 6 American hospitals. Reports were split into 2,711 core segments, annotated with 20 attributes describing the histology, grade, extension, and location of tumors. The data set was split by institutions to generate a cross-institutional evaluation setting. We assessed 4 systems, namely a rule-based approach, an ML model, and 2 hybrid systems integrating the previous methods: a Rule as Feature model and a Classifier Confidence model. Several ML algorithms were tested, including logistic regression (LR), support vector machine (SVM), and eXtreme gradient boosting (XGB). RESULTS When training on data from a single institution, LR lags behind the rules by 3.5% (F1 score: 92.2% v 95.7%). Hybrid models, instead, obtain competitive results, with Classifier Confidence outperforming the rules by +0.5% (96.2%). When a larger amount of data from multiple institutions is used, LR improves by +1.5% over the rules (97.2%), whereas hybrid systems obtain +2.2% for Rule as Feature (97.7%) and +2.6% for Classifier Confidence (98.3%). Replacing LR with SVM or XGB yielded similar performance gains. CONCLUSION We developed methods to use pre-existing handcrafted rules to inform ML algorithms. These hybrid systems obtain better performance than either rules or ML models alone, even when training data are limited.

Download Full-text

An explainable supervised machine learning predictor of acute kidney injury after adult deceased donor liver transplantation

Journal of Translational Medicine ◽

10.1186/s12967-021-02990-4 ◽

2021 ◽

Vol 19 (1) ◽

Author(s):

Yihan Zhang ◽

Dong Yang ◽

Zifeng Liu ◽

Chaojin Chen ◽

Mian Ge ◽

...

Keyword(s):

Machine Learning ◽

Decision Making ◽

Liver Transplantation ◽

Acute Kidney Injury ◽

Kidney Injury ◽

Supervised Machine Learning ◽

Gradient Boosting ◽

Support Vector ◽

Adaptive Boosting ◽

Validation Set

Abstract Background Early prediction of acute kidney injury (AKI) after liver transplantation (LT) facilitates timely recognition and intervention. We aimed to build a risk predictor of post-LT AKI via supervised machine learning and visualize the mechanism driving within to assist clinical decision-making. Methods Data of 894 cases that underwent liver transplantation from January 2015 to September 2019 were collected, covering demographics, donor characteristics, etiology, peri-operative laboratory results, co-morbidities and medications. The primary outcome was new-onset AKI after LT according to Kidney Disease Improving Global Outcomes guidelines. Predicting performance of five classifiers including logistic regression, support vector machine, random forest, gradient boosting machine (GBM) and adaptive boosting were respectively evaluated by the area under the receiver-operating characteristic curve (AUC), accuracy, F1-score, sensitivity and specificity. Model with the best performance was validated in an independent dataset involving 195 adult LT cases from October 2019 to March 2021. SHapley Additive exPlanations (SHAP) method was applied to evaluate feature importance and explain the predictions made by ML algorithms. Results 430 AKI cases (55.1%) were diagnosed out of 780 included cases. The GBM model achieved the highest AUC (0.76, CI 0.70 to 0.82), F1-score (0.73, CI 0.66 to 0.79) and sensitivity (0.74, CI 0.66 to 0.8) in the internal validation set, and a comparable AUC (0.75, CI 0.67 to 0.81) in the external validation set. High preoperative indirect bilirubin, low intraoperative urine output, long anesthesia time, low preoperative platelets, and graft steatosis graded NASH CRN 1 and above were revealed by SHAP method the top 5 important variables contributing to the diagnosis of post-LT AKI made by GBM model. Conclusions Our GBM-based predictor of post-LT AKI provides a highly interoperable tool across institutions to assist decision-making after LT. Graphic abstract

Download Full-text

Use of a Machine Learning Method in Predicting Refraction after Cataract Surgery

Journal of Clinical Medicine ◽

10.3390/jcm10051103 ◽

2021 ◽

Vol 10 (5) ◽

pp. 1103

Author(s):

Tomofusa Yamauchi ◽

Hitoshi Tabuchi ◽

Kosuke Takase ◽

Hiroki Masumoto

Keyword(s):

Machine Learning ◽

Cataract Surgery ◽

Test Data ◽

Training Data ◽

Gradient Boosting ◽

Support Vector ◽

Power Calculation ◽

Iol Power Calculation ◽

Significant Difference ◽

Iol Power

The present study aims to describe the use of machine learning (ML) in predicting the occurrence of postoperative refraction after cataract surgery and compares the accuracy of this method to conventional intraocular lens (IOL) power calculation formulas. In total, 3331 eyes from 2010 patients were assessed. The objects were divided into training data and test data. The constants for the IOL power calculation formulas and model training for ML were optimized using training data. Then, the occurrence of postoperative refraction was predicted using conventional formulas, or ML models were calculated using the test data. We evaluated the SRK/T formula, Haigis formula, Holladay 1 formula, Hoffer Q formula, and Barrett Universal II formula (BU-II); similar to ML methods, we assessed support vector regression (SVR), random forest regression (RFR), gradient boosting regression (GBR), and neural network (NN). Among the conventional formulas, BU-II had the lowest mean and median absolute error of prediction. Therefore, we compared the accuracy of our method with that of BU-II. The absolute errors of some ML methods were lower than those of BU-II. However, no statistically significant difference was observed. Thus, the accuracy of our method was not inferior to that of BU-II.

Download Full-text

An explainable supervised machine learning predictor of acute kidney injury after adult deceased donor liver transplantation

10.21203/rs.3.rs-442049/v1 ◽

2021 ◽

Author(s):

Yihan Zhang ◽

Dong Yang ◽

Zifeng Liu ◽

Chaojin Chen ◽

Mian Ge ◽

...

Keyword(s):

Machine Learning ◽

Decision Making ◽

Liver Transplantation ◽

Acute Kidney Injury ◽

Clinical Decision Making ◽

Kidney Injury ◽

Supervised Machine Learning ◽

Gradient Boosting ◽

Support Vector ◽

Adaptive Boosting

Abstract Background: Early prediction of acute kidney injury (AKI) after liver transplantation (LT) facilitates timely recognition and intervention. We aimed to build a risk predictor of post-LT AKI via supervised machine learning and visualize the mechanism driving within to assist clinical decision-making.Methods: Data of 894 cases that underwent liver transplantation from January 2015 to September 2019 were collected, covering demographics, donor characteristics, etiology, peri-operative laboratory results, co-morbidities and medications. The primary outcome was new-onset AKI after LT according to Kidney Disease Improving Global Outcomes guidelines. Predicting performance of five classifiers including logistic regression, support vector machine, random forest, gradient boosting machine (GBM) and adaptive boosting were respectively evaluated by the area under the receiver-operating characteristic curve (AUC), accuracy, F1-score, sensitivity and specificity. SHapley Additive exPlanations (SHAP) method was applied to evaluate feature importance and explain the predictions made by ML algorithms.Results: 430 AKI cases (55.1%) were diagnosed out of 780 included cases. The GBM model achieved the highest AUC (0.76, CI 0.70 to 0.82), F1-score (0.73, CI 0.66to 0.79) and sensitivity (0.74, CI 0.66 to 0.8). High preoperative indirect bilirubin, low intraoperative urine output, long anesthesia time, low preoperative platelets, and graft steatosis graded NASH CRN 1 and above were revealed by SHAP method the top 5 important variables contributing to the diagnosis of post-LT AKI made by GBM model.Conclusions: Our GBM-based predictor of post-LT AKI provides a highly interoperable tool across institutions to assist decision-making after LT.

Download Full-text

Machine learning to predict distal caries in mandibular second molars associated with impacted third molars

Scientific Reports ◽

10.1038/s41598-021-95024-4 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Sung-Hwi Hur ◽

Eun-Young Lee ◽

Min-Kyung Kim ◽

Somi Kim ◽

Ji-Yeon Kang ◽

...

Keyword(s):

Machine Learning ◽

Decision Making ◽

Clinical Decision Making ◽

Prediction Models ◽

Contact Point ◽

Characteristic Curve ◽

Gradient Boosting ◽

Support Vector ◽

Third Molars ◽

Extreme Gradient Boosting

AbstractImpacted mandibular third molars (M3M) are associated with the occurrence of distal caries on the adjacent mandibular second molars (DCM2M). In this study, we aimed to develop and validate five machine learning (ML) models designed to predict the occurrence of DCM2Ms due to the proximity with M3Ms and determine the relative importance of predictive variables for DCM2Ms that are important for clinical decision making. A total of 2642 mandibular second molars adjacent to M3Ms were analyzed and DCM2Ms were identified in 322 cases (12.2%). The models were trained using logistic regression, random forest, support vector machine, artificial neural network, and extreme gradient boosting ML methods and were subsequently validated using testing datasets. The performance of the ML models was significantly superior to that of single predictors. The area under the receiver operating characteristic curve of the machine learning models ranged from 0.88 to 0.89. Six features (sex, age, contact point at the cementoenamel junction, angulation of M3Ms, Winter's classification, and Pell and Gregory classification) were identified as relevant predictors. These prediction models could be used to detect patients at a high risk of developing DCM2M and ultimately contribute to caries prevention and treatment decision-making for impacted M3Ms.

Download Full-text

Reconstruction Process of Geomagnetic Data using Machine Learning

International Journal for Modern Trends in Science and Technology - RTT2020 ◽

10.46501/ijmtst061020 ◽

2020 ◽

Vol 6 (10) ◽

pp. 113-117

Author(s):

D. Venkat Sai J. Prasanna kumar and Dr. A.V.Krishna Prasad

Keyword(s):

Machine Learning ◽

Learning Algorithm ◽

Training Data ◽

Machine Learning Techniques ◽

Gradient Boosting ◽

Support Vector ◽

Reconstruction Method ◽

Reconstruction Process ◽

Near Surface ◽

Geomagnetic Data

The geomagnetic data plays a important role in understanding the evolutionary process of Earth’s magnetic field, as it provides necessary information for near-surface exploration, unexploded explosive ordnance detection, and so on. To reconstruct the geomagnetic data, this project presents a geomagnetic data reconstruction method based on machine learning techniques. The traditional linear approaches are prone to time inefficiency and involves high labor cost, while the proposed approach has a significant improvement. In this project, three classic machine learning models, support vector machine, random forests, and gradient boosting were built. And, a deep learning algorithm, recurrent neural network, was explored to further improve the performance. The proposed learning methods were used to specify a continuous regression hyperplane from a training data. The specified regression hyperplane is a mapping of the relation between the missing data and the surrounding intact data. Then, the trained method, were used to build the missing geomagnetic data for validation, and they can be used for reconstructing further collected new field data. Finally, numerical experiments were derived. The results shows that the performance of our proposed methods was more accurate in comparison with the traditional linear learning method, as the reconstruction accuracy was increased by approximately 10%∼20%.

Download Full-text

The Machine Learning's Classification Methods Comparison to Estimate Electrofacies Type, Lithology and Hydrocarbon Fluids from Geophysical Well Log Data

10.29118/ipa21-sg-196 ◽

2021 ◽

Author(s):

D. A. Panggabean

Keyword(s):

Machine Learning ◽

Test Data ◽

Data Evaluation ◽

Training Data ◽

Gradient Boosting ◽

Support Vector ◽

Classification Methods ◽

Evaluation Result ◽

Qualitative And Quantitative ◽

Shape Prediction

Supervised learning methods from machine learning are starting to be widely used in oil & gas data management. The usage of the method is adjusted to the purpose of data processing, including data classification and regression. In this research, there are six classification methods to estimate the electrofacies shape, lithology type, and fluids, namely Decision Tree (DT), Gaussian Naïve Bayes (GNB), K-Nearest Neighbors (KNN), Random Forest (RF), Support Vector Machine (SVM), and Gradient Boosting (XGB). This research compared those six methods qualitatively and quantitatively to obtain the best method. This research was conducted in the Maju Royal Field using one oil well data for training data and another one well as testing data. For validation purposes, 85% of the data was split for training and 15% for validation, aiming to evaluate the machine learning model through the correlation coefficient value. In the test data, qualitative and quantitative analyzes were also conducted. Qualitative analysis was performed by comparing the results of the electrofacies shape prediction with the original interpretation, lithology prediction with shale volume data, and prognosis of fluids with test zone data. Meanwhile, quantitatively, it is done by comparing the correct predictive data with the actual amount of data on each parameter. The training data evaluation result shows that KNN and XGB are suitable for electrofacies shape prediction. Meanwhile, lithology and fluid estimation are good with DT, KNN, and XGB methods. The qualitative and quantitative analysis result from the test data shows that the DT and GNB methods are suitable for estimating the electrofacies shape. In contrast, all methods are considered good at predicting and have good correlation values for calculating the lithology and fluids. Hence, both training and test data evaluation result has good correlation values

Download Full-text

Machine Learning Based Prediction of Insufficient Herbage Allowance with Automated Feeding Behaviour and Activity Data

Sensors ◽

10.3390/s19204479 ◽

2019 ◽

Vol 19 (20) ◽

pp. 4479 ◽

Cited By ~ 1

Author(s):

Abu Zar Shafiullah ◽

Jessica Werner ◽

Emer Kennedy ◽

Lorenzo Leso ◽

Bernadette O’Brien ◽

...

Keyword(s):

Machine Learning ◽

Binary Classification ◽

Characteristic Curve ◽

Training Data ◽

Sensor Data ◽

Gradient Boosting ◽

Support Vector ◽

Pasture Management ◽

Activity Data ◽

Extreme Gradient Boosting

Sensor technologies that measure grazing and ruminating behaviour as well as physical activities of individual cows are intended to be included in precision pasture management. One of the advantages of sensor data is they can be analysed to support farmers in many decision-making processes. This article thus considers the performance of a set of RumiWatchSystem recorded variables in the prediction of insufficient herbage allowance for spring calving dairy cows. Several commonly used models in machine learning (ML) were applied to the binary classification problem, i.e., sufficient or insufficient herbage allowance, and the predictive performance was compared based on the classification evaluation metrics. Most of the ML models and generalised linear model (GLM) performed similarly in leave-out-one-animal (LOOA) approach to validation studies. However, cross validation (CV) studies, where a portion of features in the test and training data resulted from the same cows, revealed that support vector machine (SVM), random forest (RF) and extreme gradient boosting (XGBoost) performed relatively better than other candidate models. In general, these ML models attained 88% AUC (area under receiver operating characteristic curve) and around 80% sensitivity, specificity, accuracy, precision and F-score. This study further identified that number of rumination chews per day and grazing bites per minute were the most important predictors and examined the marginal effects of the variables on model prediction towards a decision support system.

Download Full-text

An Interpretable Aid Decision-Making Model for Flag State Control Ship Detention Based on SMOTE and XGBoost

Journal of Marine Science and Engineering ◽

10.3390/jmse9020156 ◽

2021 ◽

Vol 9 (2) ◽

pp. 156

Author(s):

Jian He ◽

Yong Hao ◽

Xiaoqiong Wang

Keyword(s):

Machine Learning ◽

Decision Making ◽

Model Performance ◽

Machine Learning Algorithms ◽

Gradient Boosting ◽

Support Vector ◽

State Control ◽

Extreme Gradient Boosting ◽

Decision Making Model ◽

Flag State

The reasonable decision of ship detention plays a vital role in flag state control (FSC). Machine learning algorithms can be applied as aid tools for identifying ship detention. In this study, we propose a novel interpretable ship detention decision-making model based on machine learning, termed SMOTE-XGBoost-Ship detention model (SMO-XGB-SD), using the extreme gradient boosting (XGBoost) algorithm and the synthetic minority oversampling technique (SMOTE) algorithm to identify whether a ship should be detained. Our verification results show that the SMO-XGB-SD algorithm outperforms random forest (RF), support vector machine (SVM), and logistic regression (LR) algorithm. In addition, the new algorithm also provides a reasonable interpretation of model performance and highlights the most important features for identifying ship detention using the Shapley additive explanations (SHAP) algorithm. The SMO-XGB-SD model provides an effective basis for aiding decisions on ship detention by inland flag state control officers (FSCOs) and the ship safety management of ship operating companies, as well as training services for new FSCOs in maritime organizations.

Download Full-text

In silico Prediction of Inhibitory Constant of Thrombin Inhibitors Using Machine Learning

Combinatorial Chemistry & High Throughput Screening ◽

10.2174/1386207322666181220130232 ◽

2019 ◽

Vol 21 (9) ◽

pp. 662-669 ◽

Cited By ~ 1

Author(s):

Junnan Zhao ◽

Lu Zhu ◽

Weineng Zhou ◽

Lingfeng Yin ◽

Yuchen Wang ◽

...

Keyword(s):

Machine Learning ◽

Prediction Models ◽

Regression Tree ◽

Large Data ◽

Thrombin Inhibitors ◽

Coagulation Cascade ◽

Gradient Boosting ◽

Support Vector ◽

Data Set ◽

Descriptor Selection

Background: Thrombin is the central protease of the vertebrate blood coagulation cascade, which is closely related to cardiovascular diseases. The inhibitory constant Ki is the most significant property of thrombin inhibitors. Method: This study was carried out to predict Ki values of thrombin inhibitors based on a large data set by using machine learning methods. Taking advantage of finding non-intuitive regularities on high-dimensional datasets, machine learning can be used to build effective predictive models. A total of 6554 descriptors for each compound were collected and an efficient descriptor selection method was chosen to find the appropriate descriptors. Four different methods including multiple linear regression (MLR), K Nearest Neighbors (KNN), Gradient Boosting Regression Tree (GBRT) and Support Vector Machine (SVM) were implemented to build prediction models with these selected descriptors. Results: The SVM model was the best one among these methods with R2=0.84, MSE=0.55 for the training set and R2=0.83, MSE=0.56 for the test set. Several validation methods such as yrandomization test and applicability domain evaluation, were adopted to assess the robustness and generalization ability of the model. The final model shows excellent stability and predictive ability and can be employed for rapid estimation of the inhibitory constant, which is full of help for designing novel thrombin inhibitors.

Download Full-text